
DELHI POLYT ECDIN 1C 

LIBRARY 


CLASS NO. 
BOOK NO. 


3n 


/\( C.LSS10N NO into 




THE ADVANCED 


THEORY OF STATISTICS 



See also Vol. II of this book— 

THE ADVANCED THEORY OF 

STATISTICS VOL. II 

By M. G. Kendall. M.A. 

Contents : —Estimation likelihood — Estimation : miscellaneous methods—Con¬ 
fidence intervals—Fiducial inference—Some common tests of significance—Regression 
—The analysis of variance—The design of sampling inquiries—General theory of 
significance tests—Multivariate analysis—Time senes— Appendices A and B — 
Bibliography— Index to Vol. II. 

Crown Quarto. Pp. ix | 521. 30 illustrations and 52 tables. 

Price 50s. 


Other books of interest— 

AN INTRODUCTION TO THE THEORY 
OF STATISTICS 

By G. Udny Yule, C.B.E., M.A., F.R.S., and 
M. G. Kendall, M.A. 

Contents : Notes on notation and on tables for facilitating statistical work— 
Introduction—Theory of attributes—Notation and terminology—Consistence of 
data—Association of attributes—Partial association—Manifold classification—Fre¬ 
quency-Distributions—Averages and other measures of location—Measures of 
dispersion—Moments and measures of skewness and kurtosis —I hree important 
theoretical distributions the Binomial, the Normal and the Poisson—Correlation 
—Noimal con elation—Further theory of correlation -Partial correlation — Cor 
relation , Illustrations and practical methods—Miscellaneous theorems involving 
the use of the cor. elation coefficient—Simple curve fitting—Preliminary notions on 
sampling—The sampling of attributes Large san pics—The sampling of vat i.ibles 
Large samples—The \ distribution—The samplir.g of variables Sniall samples— 
Intel polation and graduation—References—1able>—Answers to exercises—Index. 

Thirteenth edition. Medium 8vo. Pp. xiii 570. With 55 
Diagrams and 4 Folding Plates. Price 24s. net. 


BIOMATHEMATICS 

Principles of mathematics for students of biological science 

By W. M. Feldman, M.D., B.S.(Lond.), F.R.S.(Edin.), F.R.C.S. 

Second edition, enlarged and re-set Crown 8vo. Pp. xviii 480. 
With many worked examples, and 164 Diagrams. Price 28s. net. 


CHARLES GRIFFIN & COMPANY, LTD. 




THE ADVANCED 

THEORY OF STATISTICS 


by 

MAURICE G. KENDALL. M.A. 

An Honorary Secretary of the Royal Statistical Society ; 
Assistant General Manager and Statistician to the Chamber 
of Shipping of the United Kingdom. 


VOLUME I 


With 16 Illustrations and 79 Tables 


THIRD EDITION 



LONDON 

CHARLES GRIFFIN & COMPANY LIMITED 
42 DRURY LANE 
I 947 


[All Rights Reserved] 



Printed in Great Britain 
by Butler & Tanner Lnmted, Frame 



DEDICATED 
TO MY MOTHER 


" Let UR sit on this log at the roadside,” says T, “ and 
forget the inhunianity and nhaldry of the poets. It is in 
the glorious columns of ascertained facts and legalised 
measures that hcauly is to be found. In this very log 
we sit uj)on, ^Mrs. Samj)son,” says 1, ‘‘is statistics more 
wonderful than any ])ocin. The rings show it Mm sixty 
years old. At the de])th of two thousand feet it would 
become coal in three thousand years. The dee])e8t coal 
mine in the Mwld is at Killingworth, near Newcastle, 
A box four feet long, three feet vide, and two feet eight 
iiu hes de('p will hold one ton of coal. If an art-ery is (‘ut, 
(‘ornpress it above the wound. A man’s leg contains thirty 
bones. The Tower of London was burned in 1841.” 

“ Go on, Mr. Tratt,” says Mrs, Sam})son. Them ideas 
is so original and soothing. 1 think statistics are just as 
lovely as they can be.” 

0. Henry, The Handbook of Hymen, 


% 




PREFACE 


The need for a thorough exposition of the theory of «ta1is<ies has been r<^j)eat('dJy 
emydiasised in recent years The object of this book is to develo]) a sysknnatic treatnicnl 
of that theor^^ as it exists at the present time. Originally my intention was to coinplcte 
the work in one volume, but the war has made such a course impossible. Nevertheless, 
tins hrst volume is largely complete in itself and can, I ho]>e, be profitably read in advance 
of the publication of its successor. 

In llblB T)r. M. S. Bartlett, Dr. J. 0. Irwin, Professor E. S. Pearson, Dr. John AVlsliart 
and I discussed the ]H)Sbibility of writing a treatise on tlie theory of statisti(*s in vo oj^cration, 
and oven got as far as sket idling a synopsis. This proposal, huw^ever, had to be abandoned 
after the outbreak of w ar, and with some misgivings I decided to procc'ed alone. My ])r(\sent 
treatment differs very considiTabl}' from the one then agreed upon, sirH‘e a number of 
sacrifices of viewpoint made for the pur]K)se of reaching unanimity are no longer necessary. 
1 must accordingly assume sole responsibility for the form and content of the present book, 
but acknowledgment is due to my colleagues for the helpful discussions which took place 
Avhile th(' syno})sis of the original proposal w^as being drafted. 

A[)art from the usual problems arising in wTiting any book wdth pndimsions to coiu- 
[)r(‘hensi\cness- emphasis, rejection of unimportant material, sequence of presentation, 
and so forth—there were two main questions to be deeided m regard to this book: the 
amount of matlieinaties admitted, and the jioint of introduction of the theory of jirolnibility. 
Statistical theory is essentiall\ inathematieal, and 1 have not hesitated in faA‘t I have 
bemi cornpelliHl to adopt a rather advanced mathematical treatment in order to ai'hieve 
rigour w Jieri'it is altainable in the present state of our knowdedge. Nevertheless T ha^’o 
tried (in jiku'es. pertiaps. wdth indifferent success; to keep the mathematics to herd. 
This is intended to be a> book on vstatistics, not on statistical mathematics. 

As to the jdace of the theory of probafiility, I have teli it ])referable to deal with 1}i(‘ 
descriptive pioperties of freipienc}-distributions befoie intiodiieing the ])robabilit> (‘oncr pt. 
This is justified both by tlie historieal development of the snlqeet and liy the necessities 
of a logical presentation. Some readers may feel that tlie whole theory of modern statistics 
is so iiernieated witli the sanifiling eonee]>ti()u that an earlier introduction of jirohabdily 
would more than ottset the lo>ss in logical seijuenee by the gain in didactic force. This 
\iew I myself hold to be fundament.illy w'rong, but if the reader feels keenly on the subject 
he has, after all. merely to read (Jjajiters 7 and 8 immediately after Ohajiter I and the 
difficulty is to a great extent resolved. 

The subjects coverinl by the firesent volume may be considered under tlirec main 
heads, (diaj)ters I to (> deal with Freipiency-Distrihutions and their projierties. Pliajiters 
7 to 11 deal with the Tlieories of Proiialiility and Sampling and with the Sanqihng Dis¬ 
tributions to wliiidi they lead. Broadh, this section comprises the theory of those distri¬ 
butions which ari' derived from jiarent populations for special purposi's such as infi^rences 
in probability, and may be termed the Theory of Derived Distributions. (Iiajiters IJ to 16 
deal witli the Theory of Correlation, considered as a measure of relationship, the gcaieral 
theory of regression analysis being left to the second volume, diapter 12, on the 
tribution, is perJiaps something of an intrusion in the devidopmcnt, but in view^ of the 

vii 



Vlll 


PREFACE 


widespread applications of in testing agreement between theory and observation I felt 
that it should be introduced at an early stage. 

The second volume will deal with the Theory of Estimation, Regression, Analysis 
of \'ariaTK;e, Tests of Significance, Multivariate Analysis, Theories of Statistical Inferen(‘e, 
and Time Series, In the first volume it has been j)Ossible to avoid a detailed examination 
of controversial topics connected with the logic of inference in probability ; the subject 
will be taken up more systematically in the second volume. 

On the invaluable principle that example is better than j)reeept, a special effort has 
been made to exemplify the tlieory at every stage and to provide exercises for the reader 
to work out for himself. Some of the latter are rather difficult, but have nevertheless been 
included to illustrate the scope of ap])lication of the theory and to refer to results for which 
no place could conviuiiently be found in the text. In assembling tliis material I have 
draw'll freely on the wealth of research work in statistical periodicals, paHicularly Biornetrika, 
and am glad to make acknowdedgment to the authors from whose i)aj)ers examples have 
been taken. 

Foremost among my more specific* indebtedness is that to Dr. Leon Isserlis, who 
read the w hole book at the galley proof stage and to whose careful scrutiny 1 owe a great 
deal. 1 have also to thank Dr. J. 0. Irwin, who allowed me to consult his draft of a chapter 
originally intended for the co-operative treatise (this forms the basis of Chapter 10) ; 
Professor R. A. Fisher and Messrs. Oliver and Boyd, for permission to reproduce Appendix 
Tables 4 and 5 from the former’s BiatisHcal Methodfi for Research Workers ; and the jiub- 
lishers, Messrs. Cliarles Griffin and (>)., and the jirinters, Messrs. Butler & Tanner Ltd., 
wdio have taken great pains wdth some very difficailt manusc’ript. 

I shall be grateful to any reader who notifies me of any error, omission or ambiguity, 
from which, I fear, no book of this kind can be entirely free at its first a])pcarance. 

M. G. KENDALL. 

London, 

February Ist, 1943^ 


NOTE TO SECOND AND THIRD EDITIONS 

In this and the second edition no alterations of substance have l)oen made, but a number 
of mis})rints have been removed and a few references added to w'ork })ublished since the 
issue of the first edition. I am indel)ted to several correspondents for calling my attention 
to misprints and points wdiere the presentation was ambiguous. 


November, J946. 


M. G. K. 



TABLE OF CONTENTb 


CHAP. PA(.KS 

Introductory Note ... ... ... ... ... ... ... ... xi-xii 

1. Frequency-clistrihutiona ... ... . . . . . ... ... . ... 

2. Mcamires of location and dispersion ... . ... ... .. ... .. 4H 

3. Moments and cuinulants... ... ... ... ... ... .. . .. 4q S9 

4. Characteristic functions ... . ... ... ... ... .. .. . l)0-lir> 

r>. Standard distributions (1) . . IK; i;i(; 

6. Standard distributions -(2) ... ... ... ... . . ... . . .. 137-1 (>3 

7. Probability and hkelihood . . .. . ... ... .K>4 lsr> 

5. Random sampling . . .180 2t)3 

9 Standard errors ... ... ... . . ... ... .. ... . . 2bt 230 

10. Exact sam])ling distributions ... ... ... ... ... ... ... .. 231 253 

11. A]>proximatioiiH to sampling distributions .. .. ... .. ... .. 254-289 

12 The ;^“-distribution ... ... ... ... ... ... ... ... .. 290 307 

13. Association and cont.ing(*n(‘y . , .. ... ... ... ... .. . . 308-323 

It. Product monuait correlation . .. ... ... .. ... ... 324 367 

15 I’artud and multiple coriclation . , ... ... ... ... ... ... 368 387 

16. Rank correlation. . ... ... . . 388-437 

Ai‘i*EN I>1 \ Tables :—> 

1. Pnapamey function of the normal distribution ... . . .. ... ... 438 

2. Distribution function ol the normal distribution . . . . ... ... 43!^ 

3 Distribution tuuction of the /-distiibiitioii ... ... ... ... . 410 441 

4. 5 ])cr cent j) ants of the ^-distribution ... ... ... .. .. ... 442 

5. 1 p('r cent, points of the ;:-distributKm ... ... ... . ... ... 443 

6. Distribution function of for one degree of frei^dom, — 0 to 1 ... 444 

7. Distribution function of for one di^greo of freedom, - 1 to 10 ... 445 

Appeis'DIX Diagram : Contour liii(‘s of the v surface... ... ... ... ... 446 


\x 


Index to V^ilume I 


.. 447-457 




















INTRODUCTORY NOTE 


0,1 . The chapter-fjections in this book are numbered serially. The serial numbers 
are prefixed by the number of the chapter in which they occur and are separated thorcfrojn 
by a period, e.g. 14*13 refers to the thirteenth section of Chapter 14. A similar proce<lure 
is followed for tables, equations and exercises, o.g. (7.15) refers to the fifteenth equation 
of Chapter 7. In cross-references, chapter-sections are denoted by clarendon ty})e, the others 
by ordinary type. 

0*2. References to printed work are given by author’s name and date of ])Til)lifation, 
In the list of references at the end of the chapter authors are arranged alphabetically. 
Where articles from publi(;ations are referred to, the number of the volume is given in 
clarendon type and the number of the first page of the article in ordinary type, e.g. Ann. 
Math. Statist., 10, 275, refers to the article beginning on page 275 of volume 10 of the Annals 
of Mathematical Statistics. Where an exercise is followed by an author’s name and a date, 
the result given in the exercise appears in the article listed in the references to the chapter 
concerned under these particulars. Where the result is from an article not previously 
referred to a full reference is given* 


0.3. The mathematical notation is that in current use, but a few symbols may bo 
explained. 

(1) The exclamation mark ! written after an integer means the factorial of that integer. 
Some writers give the symbol a more extended use fur non-integral numbers by writing 

x! - r{x + !)-:[ dt. 

J 0 

This, of course, a(*cords Avitli the factorial notation, but will not be used in this book. 

/ n\ 71 ! 

(2) The combinatorial sign (1 — will be used in place of the older 

; V 

(3) The summation sign will be written as 1\ e.g. ^Xj = -f Xa + . • * + 

]- n n _ 

The symbol can as a rule be shortened to ^ and in many cases to ^ or merely to 

j 1 y 

2*, the extent of the summation being clear from tlio context. 

(4) The ordinary notation for the /'-function (given above), the /-function, and the 
hypergeometric function will be used, i.e. 

q) - (1 - (lx = 

Jo I i: 


and 

/(a,iS,y,x)-l + ^*"x + 


^ « (« ^ a(^+ 1)(« + 2)_^(/3 + 1)(/? + 2) 


1 .2.y(y 1) 


Ip + <?) 

¥ + 1 )(/ 

1.2.3.y(y + i)(y-f 2) 


X® -f . . . 


(5) Where the exponent is concise, the exponential function will be written as a power 
of e, for example e^-***. But where it is lengthy we shall use the notation exemplified by 

exp {— "Kx® — 2pxy -f- 1 /®)} instead of 


XI 



Xll 


INTRODUCTORY NOTE 


0.4. In some fields it is useful to preserve a distinction between a statistical parameter 
in a population and the estimate of that parameter from a sample. Where possible, the 
former will be denoted by a Greek letter and the latter by a Roman letter, e.g. the product- 
moment correlation coefficient of a population is denoted by p and that of a sample by r. 
It is not, however, always possible to preserve this distinction, as for instan(ie with the 
multiple correlation coefficient i2, in which case a Greek capital would be confused with the 
Roman P, Complete notational consistence can only bo achieved at the expense of 
jettisoning a great deal of accepted statistical usage, and even then would probably 
result in some cumbrous symbols. 

0.5. In order to enable the reader to follow the worked examjdes and illustrative 
material, a few tables of functions commonly required are given at the end of this volume. 
These tables are in no w’^ay a substitute for the comprehensive sets which have been pub¬ 
lished and w^hieh are a necessary adjunct to most practical and a good deal of theoretical 
work. Frequent reference will be made to the following: — 

Tables for Statisticians and Biometricians, edited by Karl Pearson, Parts I and II, Biometrika 
Office, University College, London, W.C.l. 

Statistical Tables for use in Biological, Agricultural and Medical Research, by R. A. Fislier 
and F. Yates, Oliver and Boyd, Edinburgh. 

The following are also useful :— 

Tables of the Incomplete F-function, edited by Karl Pearson, Biometrika Office, University 
College, London, W.C.l. 

Tables of the IncomjiUte B-function, edited by Karl Pearson, Biometrika Office, University 
College, London, W.C.l. 

The Kelley Statistical Tables, by T. L. Kelley, Macmillan, London and New^ "S'ork. 

Tables of Pearson’s Type III Function,” by L. R. Salvosa, Ann. Math. Sfaltsf , 19;U), 
1, 191. 

Tables of the Higher Mathematical Furictions, edited by H. T. Davis, Parts I and II, Principia 
Press, Bloomington, Indiana. 

Tables of Random Sam}jling Numbers, by L. H. C. Tijipett, Tracts for Computers. No. 15, 
Cambridge University Press, 

Tables of Random Sampling Numbers, by M. G. Kendall and B. Babington Smith, Tracds 
for Comjmtera, No. 24, Cambridge University Press. 

Tables of the Correlation Coefficient, by F. N. David, Biometrika Office, University College, 
London, W C.l. 

Tables of tan" ^ x and log (1 + by L. J. Comrie, Tracts for Computers, No. 23. 

Tables of the Probability Integral, by W. F. Sheppard, British Association Mathematical 
Tables, Vol. 7, Cambridge University Press. 

0.6. The references given at the end of the chapters are mainly intended to guide 
further reading and are not exhaustive. A more complete bibliography will be found in 
Mr. Yule’s and my Introduction to the Theory of Statistics, wdiich contains about 700 refer¬ 
ences to work appearing up to about 1932, and in the valuable periodic reviews of recent 
advances in theoretical statistics a^ipearing in the Joiirnal of the Royal Statistical Society 
and the Journal of the American Statistical Association. A recentl^’^-begun monthly publica¬ 
tion by the American Mathematical Society, Mathematical Reviews, also contains material 
of interest in this connection. 



CHAPTER 1 

FREOUENCY^DISTRIBUTIONS 

Statistics as the Science of Populations 

1.1. Among the many subjectn about which statisticians disagree is the definition 
of their science. In the Beime de VJnsiUut International de Statistique for 1935 (voL 3^ 
page 388) Dr. W. F. Willcox listed well over a hundred definitions of statistics, and the 
list was far from exhaustive. Even when we exclude those definitions which were formulated 
before the subject reached its present extent we are left with a variety of choices, and 
there is no definitive description of the scope of the science of statistics with which we 
can begin this book. 

1.2. The fundamental notion in statistical theory is that of the group or aggregate, 
a concept for which statisticians use a special word—“ population This term will bo 
generally employed to denote any collection of objects under consideration, whether 
animate or inanimate ; for example, we shall consider populations of men, of plants, of 
mistakes in reading a scale, of barometric heights on different days, and even j)opulations 
of ideas, such as that of the possible ways in which a hand of cards might be dealt. The 
notion common to all these things is that of aggregation. 

It is with the properties of populations that statistics is mainly concerned. In con¬ 
sidering a population of men we are not interested, statistically speaking, in whether some 
particular individual has brown eyes or is a forger, but rather in how many of the individuals 
have brown eyes or are forgers, and whether the possession of brown eyes goes with a 
propensity to forgery in the population. We are, so to speak, concerned with the properties 
of the population itself. Such a standpoint can occur in physics as well as in demographic 
sciences. For instance, in discussing the behaviour (T a gas we are not so much interested 
in the behaviour of particular molecules, as in that of the aggregate of molecules which 
go to compose the gas. The statistician, like Nature, is mainly concerned with the species 
and is careless of the individual. 

1.3. We may therefore begin an approach to a definition of our»subject by the 
following : statistics is the branch of scientific method which deals with the properties 
of populations. This, howwer, is rather too general. Statistics deals only with the 
numerical properties. A dictionary, for example, sets out a population of words, and 
among the properties of that population which are a suitable subject for scientific inquiry 
is that of word-derivation. It is not of statistical concern, however, to know that some 
words are derived from Latin, some from Anglo-Saxon and some from Hindustani. The 
subject would only assume a statistical aspect if we were to inquire how many words were 
derived from the different sources. 

1.4. As a second approximation to our definition we may then try the following: 

statistics is the branch of scientific method which deals w ith the data obtained by counting 
or measuring the pro}X‘rties of populations. ^ 

This again is a little too general, A set of logarithm tables is a poi)ulation of numerals, 
but it is hardly a subject for statistical inquiry, for every numeral is determined according 

A.S-^VOL I. B 



2 FREQUENCY-DISTRIBUTIONS 

to mathematical laws. The statistician is rather concerned with pof>ulations which occur 
in Nature and are thus subject to the multitudinous influences at work in the world at 
large. His populations rarely, if ever, conform exactly to simple mathematical rules, 
and in fact it is in the depa*rture from such rules that he Often finds topics of the greatest 
statistical interest. To allow for this factor wo may then formulate our definition as 
follows:— 

Statistics is the branch of scientific method which deals with th^ data obtained by 
counting or measuring the properties of populations of natural phenomena. In this 
definition “ natural phenomena includes all the happenings of the external world, whether 
human or not. 

This is as far as we need pursue the matter. The reader who is interes^ted enough to 
look through the definitions listed by Dr. Willcox in the article referred to above will 
find, I think, that in the light of this definition there is a perceptible thread of continuity 
running through them. 

1.5. For the avoidance of misunderstandings in the interpretation of this definition 
it may be as well to point out that “ statistics.” the name of the scientific method, is 
a collect!v^e noun and takes the singular. The same word “ statistics ” is also applied to 
the numerical material with which the method operates, and in such a case takes the 
plural. Later in this book we shall meet the singular form “ statistic,” which is not, 
as might bo supposed, an individual item of information which in the aggregate would 
compose “ statistics,” but is the name given to an estimate of certain unknown measures 
of a population. 

Freq uen cy- DisirUmtion s 

1.6. Consider a iiopulation of members each of which boars some numeric al value 
of a variable, e.g. of men measured according to height or of flowers classified according 
to numbers of petals. This variable we shall call a vanafe, jf it can assume only 
a number of isolated values it will be called discontinuous, and if it can assume any value 
of a continuous range, continuous. The populatum of members will thcui correspond to 
a population of variate-values, and it is the pro})erties of this latter poj^ulation which 
we have to consider. 

If the population consists of only a few members wo can without much difficulty 
consider the population of variate-values exhibited by them ; but if, as usua]l\ ha]>pens, 
the aggregate is large (or, in a sense defined later, infinite), the set of variate-values has 
to be reduced in some way before the mind can grasp their significance This is done 
by classification of the individuals into ranges of the variate. So far as possible the ranges 
should be equal, so that the numbers falling into different ranges are comparable. The 
interval is called the class-interval (or simply the interval) and the number of members 
bearing a variate-value falling into a given class-intc*ival is the class-frequency (or simply 
the frequency). The manner in which the eJass-frecpiencies are distributed over the 
class-intervals is called the frequency-distribution (or simply the distribution). 

1.7. Tables 1.1 and 1.2 give some frequency-distributions of observed populations 
classified according to a single variate. Table 1.1 shows the 1567 Local Government 
Areas of England and Wales distributed according to the variate “ birth-rate.” Here, 
for example, there were 7 districts with a birth-rate of between 5*5 and 6-5 per thousand, 
and 271 with a birth-rate between 13*6 and 14»5 per thousand. The general nature of 



FREQUENCY-BISTRIBUTTONS 


3 


TABLE LI 


Showing the Number of Local Government Areaa in England with Specified Birth-rates per 

Thousand of Population. 

(Ifaterial from the Registrar-General ’a Statistical Review of England and Wales for 1933.) 



Birth-rate. 

Number of 
Distnet-a with 
Birth-rate in 
Specified Range. 

Birth-rate. 

« 

Number of 
Districts with 
Birth-rate m 
Specified Range. 

1 *5 and not ejcoeediiig 2 5 

1 

13 5 and not exceeding 14 5 

271 

2-5 


ft 3 5 

2 

14 5 

„ 15 5 

J90 

3 5 

»» 

„ 4 5 

2 

15 5 

„ 16 5 

127 

4 5 

»r 

„ 5 5 

3 

16 5 ,, 

M 17 5 

89 

5-5 

ft 

ft 6 

7 

17 5 

„ IK 5 

78 

65 

*» 

„ 7 5 

9 

18 5 

19 5 

37 

7 5 

ft 

„ 8 5 

14 

19 5 

20 5 

21 

8 5 

^9 

„ 9 5 

41 

20 5 

„ 21 5 

17 

9 5 

ft 

.. 10 5 

83 

21 5 

„ 22 5 

4 

JO 5 

if 

„ 11 5 

131 

22 5 

23 5 

4 

11 5 

ft 

„ 12 5 

192 

23 5 

„ 24 6 

2 

12 5 

99 

13 5 

242 


d’OTAI. 

j 1567 


TABLE 1.2 

Showing the Numbers of Persons In the Untied Kingdom liable to Surdax and Super-tax in 
the Year beginning bih April 1931, classified according to the Magnitude of their 
Annual Income, 

(From the >Statistical AbvSlraci for the United Kingdom for the Years 1913 and 1919-32, 

Cmd 4489.) 


Annual Tncoino 
(£ 000 ) 


2 and not 

exceeding 2i 

2 5 

3 

3 

4 

4 

5 

5 

6 

6 

7 

7 

„ 8 

8 

10 

10 

15 

15 

„ 20 

20 

„ 25 

25 

„ 30 

30 

40 

40 

„ 50 

50 

,, 75 

75 

„ iOO 


100 and ovit 

Total nunilx^r of porsons 


Nurnbc'r of 
Persons. | 

Fstiniated 
Prequeney per 
£500 Interval. 

23.988 

23,988 

15,781 

15,781 

17,979 

8,989 

9,755 

! 4,877 

.5,921 

2,900 

3,729 

1,864 

2,546 

1,273 

3,193 

' 798 

3,610 

362 

1,328 

133 

679 

68 

378 

38 

372 

19 

192 

10 

182 

4 

57 ! 

1 

94 

? 

89,790 

— 















4 


frequency-distributions 







Fig. 1.1. 


fO IS ZO 

Btrth-rste (perthousAvdof population) 

Frequenc5»^ Polygon of the Data of Table 1.1. 


the distribution is shown in this table in a way which would be quite impossible if each 
of the 1567 districts were shown separately. The greatest number of districts fall within 
the range 13-5-“l4-5 per thousand and the frequencies tail off on either side of this value. 
Table 1.2 shows the number of persons subject to sur-tax and super-tax in the United 
Kingdom in 1931 classified according to the variate “ income.’' The class-intervals here 
are unequal—a typical defect of official figures—and in the last column 
of the table is a reduction of the cl ass-frequencies to comparability, namely, 
to frequency per £500 within the class-interval concerned. Looking at this 
column we see that the maximum free|uency per £500 in this case is at the 
beginning of the frequency-distribution. 

20 ‘ 

1.8. The frequency-distribution may be represented graphically. 

1 Measurmg the variak‘-value along the .r-axis and fr(5(|iiency per class- 

interval along the ^/-axis, we erect at the abscissa corresponding to the 
s ^5 • /centre of each class-inteilVal an ordinate equal to the frequency per unit 

i ^ interval in that interval. The? ends of these ordinates are joined })y 

e straiglit lines, one to the next. The diagram so obtained is called a 

§ Frequency Polygon. Fig 1.1. shows the frequency polygon for the data 

ofTablel.l. 

As a variant of this procedure we may erect on the abscissa 
•f range corresponding to each class-interval a rectangle whose area 

% is proportional to the frequency in that interval. A diagram 

constructed in this way is called a Histogram. Fig. 1.2 shows 
5 ■ —I such a histogram for the data of Table 1.2. It is evident that 

the histogram is a 

I more suitable form 

""" — I representation 

. ^0 I . ~?o - S -30 

Annual fncome (£ooo) intervals are un- 

Fia. 1.2. Histogram of the Data of Table 1.2, equal. 



FREQUENCY-DISTRIBUTIONS 


5 


1.9. A few practical points in the tabulation of observed frequency-distributions 
may be noted. 

(1) It has been remarked that wherever possible the class-intervals should be equal. 
The importance of this will be more appreciated in subsequent chapters ; but it is already 
evident that comparability is difficult to carry out by inspection when there exist inequalities 
in class-intervals. On running the eye down the second column of Table 1.2, for example, 
we note that 1ihe frequencies in intervals 3-4 and 8-10 are greater than in the immediately 
preceding intervals ; but this is merely due to a change in the width of the intervals 
at those points and, as is seen from the third column, the frequency per unit intenral 
decreases steadily, 

(2) It is important to specify the class-interval with precision. We not infrequently 
meet with such classifications as “ 0-10, 10-20, 20-30,” etc. To which interval is a member 
with variate-value 10 assigned ? Obviously the classification is ambiguous if such values 
can in fact arise. We must either take the intervals greater than or equal to 0 and less 
than 10, greater than or equal to 10 and less than 20,” or make it clear what convention 
we use to allot a variate-value falling on the border between two neighbouring intervals, 
e.g, it might be decided to allot one-half of the member to each. There are various ways 
of indicating the class-interval in practical tables, e.g. 10-, 20-, 30-” means “ greater 
than or equal to 10 and less than 20,” and so forth. Sometimes, where a continuous 
variate is concerned, there is an element of imprecision in the specification of the fineness 
to which the measurements are made ; for example, if wo ar^ measuring lengths in 
centimetres to the nearest centimetre, an interval shown as greater than 15 and less than 
18” means an interval of “ greater than 14.5 and less than 18.5.” When the precision 
of the measurements is known wc can specify an interval by its middle point, for example, 
in tliis case, 16.5, 


TABLE 1.3 


Showivg the 'Number of Deaths from Scarlet Fever at Different Ages hi England and Wales 

in 1933. 

(Data from Registrar-General’s Statistical Review of England and Wales for 1933, Tables 

l^irt 1, Medical.) 


Ago in Yoars. 

Niimbor of 
Doaths. 

Nurnbor per 
Year, 

Agi‘. in Y(‘arh. 

Nunilior of 
Deatlis. 

Number por 
Year. 

0- 

16) 

16 

40- 

10 

20 

1- 

691 

69 

45- 

6 

1-2 

2- 

S9 1-322 

89 

50- 

7 

1-4 

3- 

74 1 

74 

55- 

5 

10 

4- 

74] 

74 

60- 

— 

— 

6- 

213 

42 6 

65- 

1 

0*2 

10- 

70 

140 

70- 

1 

0*2 

15- 

27 

5-4 

75- 

1 

0*2 

20- 

26 

5 2 

80- 

— 

— 

25- 

17 

3-4 


1 


30- 

12 

2-4 




35- 1 

11 

2-2 

Total 

729 






6 


FREQUENCY-DISTRIBUTIONS 


(3) Remark (1) about the importance of equality of olass-intervals should not be hold 
to preclude the specification of frequencies in finer intervals where the frequency is changing 
very rapidly. Table 1.3, for instance, shows the number of deaths from scarlet fever in 
England and Wales in 1933 according to the variate age at death.'* If the frequencies 
in the interval 0 and leas than 5 ” were not subdivided and were thus shown as a total 
322 for the interval, we might draw the conclusion from the uniformly decreasing number 
of deaths as the variate increases that the greatiost number of deaths occurred in the 
first year of life. This is not so, as is shown by the individual frequencies in the first 
five years. 

(4) Perhaps it is hardly necessary to add that the histogram is not a suitable method 
of representing data classified according to disoontiriuous variates. It shows the class- 
frequency uniformly dispersed over the whole interval, whereas if the variate is discon¬ 
tinuous, frequencies must necessarily be concentrated at certain points. 


Frequency-Distributions : Discontinuous Variates 

1.10. It will be useful at this stage to give some examples of the frequency-distri¬ 
butions which occur in practice. 

Table 1.4 shows the distribution of digits in numbers taken from a four-figure telephone 
directory. The numbers were chosen by opening the directory haphazardly and taking 
the last two digits of all the numbers on the page except those in heavy tvi)e. The 
distribution is irregular, but from a cursory inspection of the table we are inclined to suppose 
that the digits occur approximately ecpially frequently in the larger population from 
which these 10,000 members were chosen. We shall see later (p. 193) that the divergences 
from the average frequency per digit, 1000, are not accidental sampling elfects ; but at this 
stage it is sufficient to note that the data suggest for consideration a population of equally 
frequent members. 


TABLE 1.4 

Showing Number of Different Digits chosen haphazardly from the London Telephone 

Directory. 

(M. G. Kendall and B. Babington Smith (1938), Jour. Hoy. SUifist. Soc., 101, 147.) 


Digit . 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Total. 

Frequency 

102(i 

1107 

997 ' 

1 

966 

1 

1075 

933 

1107 

972 

964 

853 

10,000 


Table 1.5 shows the distribution of a number of seed capsules of Shirley poppies 
according to the variate “ number of stigmatic rays." The distribution in this case is 




DISCONTINUOUS VARIATES 7 

more regular, there being a maximum frequency at 13 and a steady decrease on either 
side. 


TABLE 1,5 

Showing the Frequencies of Seed Capsules on certain Shirley Poppies with Different Numbers 

of Siigmatic Rays, 

(Cited from G. Udn}^ Yule (1902), Biometrika, 2, 89.) 


Number of 
Stigmatio Rays. 

Number of Capsules 
with said Number 
of Stigmatic Rays. 

Number of 
Stigmatic Rays. 

Number of Capsules 
with said Number 
of Stigmatic Rays, 

6 

3 

14 

302 

7 

11 

15 

234 

8 

38 

16 

128 

9 

lOG 

17 

50 

10 

ir>2 

18 ! 

19 

11 

238 

19 

3 

12 

305 

20 i 

1 

13 

315 

Total ' 

1905 


In Table 1.6, on the other hand, showing suicides among women in some Gorman 
states in certain years according to the variate ‘‘ number of suicides per year/' the 
distribution reaches its maximum frequency in the region 1-3 suicides and then tails off 
rather slowly. 


TABLE 1.6 

Showing Suicides of Women in Eight German States in Fourteen years. 
(Von Bortkiewicz, Das Gesetz der kleinen Zahlen^ 1898.) 


Number of Suk idea 
per year 

1 « 


2 1 

1 

3 

4 

5 

0 ! 

1 

7 

8 

9 

10 and over 

Total. 

Fiociuoiicy . 

1 

» 1 

19 

1 

17 1 

20 j 

15 1 

11 

1 8 

2 

! 

3 

5 1 

3 

1 

112 


Frequejncy-Distributions : Continuous Variates 

1.11. Table 1.7 shows a number of adult males in the United Kingdom (including, 
i^t the time of the collection of the data, the whole of Ireland), distributed according to 
the variate “ height in inches.” The frequency polygon is shown in Fig. 1.3. It will be 
Been that the distribution is almost symmetrical, there being a maximum ordinate at 
67- inches and a steady decrease in frequency on either side of the maximum. 




8 


FREQUENCY-DISTRIBUTIONS 
TABLE L7 


Showing the Frequency-distributions of Statures for Adult Males born in the United Kingdom 

(including the whole of Ireland), 

(Final Report of the Anthropometric Committee to the British Association, 1883, p. 256.) 

As Measurementa aa*e stated to have been taken to the nearest Jth of an inch, the class-intervals are here 

profiumably 60tj-571j, 67 -58 ft, and so on. 


Height without 
8hoo8 (inches). 

Number of Mon 
within said Limits 
of Height. 

Height withovit 
Shoes (inches). 

Ninnber of Men 
within said Limits 
of Height. 

57- 

2 

69- 

1063 

58- 

4 

70- 

64 B 

59- 

14 

71- 

392 

00- 

41 

72- 

202 

01- 

83 

73- 

79 

0^ 

169 

74- 

32 

03- 

394 

75-^ * 

10 

04- 

069 

70- 1 

5 

65- 

990 

77— 

2 

00— 

1223 



67- 

1329 



68- 

1230 

Total 

8585 

i 



T'ig. 1.3. Frequency-distribution of the Data of Table 1.7, Values of the abscissa correspond 

to the beginning of class mtervals. 

This more-or-less uniform “tailing off’’ of frequencies is very common in observed 
distributions, but the symmetrical property is comparatively rare. Table 1.1 is roughly 
sjrminetrical, but Tables 1.8 and 1.9, showing respectively a number of Australian marriages 
distributed according to bridegroom’s age, and a number of dairy farms distributed 
according to costs of production of milk, illustrate that various degrees of asymmetry 
can occur. An extreme form is shown in Table 1.3. 




CONTINUOUS VARIATES 


9 


TABLE 1.8 

Showing Numbers of Marriages contracted in Australia, 1907-14, arranged accotding to the 

Age of Bridegroom in 3-Year Groups. 

(From S. J. Pretorius (1930), Biometrika, 22» 210.) 


Age of Bridegroom 
(Central Value of 3-year 
Kongo, m years). 

Number of 
Marriages, 

Agf' of Bridogiooni 
(Central Value of 3 year 
Range, m years). 

Number of 
Mariiages. 

16 r> 

294 

55 5 

3,655 

195 

10,995 

58 5 

1,100 

22 5 

61,001 

61 5 

810 

5—, ■. 

73,054 

64 5 

67 5 

649 

487 

28 5 

56„50l 

31 5 

33,478 

70 5 

326 

34 5 

20,569 

73 5 

211 

37 5 

14,281 

76 5 

119 

40 5 

9,320 

79 5 

73 

4.3 5 

6,236 

82 5 

27 

40 5 1 

4,770 

85 5 

14 

49 5 

3,620 

88 5 

5 

52 5 ' 

2,190 

Toial 

301.785 

1 


TABLE 1.9 

Sliowhig Numbers of Dairy Farms in England and Wales according to Cost of Production 

of Milk in 1935-6. 

(Della fiom Costs of Milk Production m England and Wales, Interim Report No. 2, 
Agricultural Economics Research Institute, Oxford ) 


Coht of Pioduotion 
(j>erK*e per gallon). 

No ofFainiB, 

Cost of Produetiou 
(pence per gallon). 

No of Faims. 

4- 

4 

10- 

65 

5— 

9 

11- 

40 

6- 

34 

12- 

15 

7— 

77 

13- 

4 


94 

14- 

5 

9-* 

88 

35- 

2 

1 

i 


Total 

437 


In tins connection Table 1 10, showing a number of men distributed according to weight, 
is of interest for comparison with the height data of Table 1.7. The latter is symmetrical 
but the former is not. 



10 


FREQUENCy-DISTEIBUTIONS 


TABLE 1.10 

Frequency-distribution of Weights for Adult Males bom^ in the United Kingdom. 

(Loc. ciL, Table 1.7. Weights were taken to the nearest pound, consequently the true 
class-intervals are 89*5-99-5, 99*5-109*5, etc.) 


Weight in lbs. 

Frequency. 

Weight in lbs. 

Frequency. 

90- 

2 

190- 

263 

100- 

34 

200- 

107 

110- 

162 

210- 

86 

120- 

390 

220- 

41 

130- 

867 

230- 

16 

140- 

1623 

240- 

11 

150- 

1559 

250- 

8 

160- 

1326 

260- 

1 

170- 

787 

270- 

— 

180- 

476 

280- 

1 



Total 

7749 


1.12. When the asymmetry of a distribution such as that of Table 1.3 becomes 
extreme we may be unable to determine whether, near the maximum ordinate, there 
is a fall on either side, or whether the maximum occurs right at the start of the distribution. 
This would have been the case in Table 1.3 if we had not the finer grouping for the first 
five years of life ; and it is the case in Table 1.2, in which the maximum frequency apparently 
occurs at or very close to an income of £2,000 per annum. Asymmetrical distributions are 
sometimes called “skew’' ; and those such as Table 1.2 are called “ J-shaped/’ 

1.13. In rare cases the distribution may have maxima at both ends, as in Table 1.11, 


TABLE 1.11 

Showing the Frequencies of Estimated Intensities of Cloudiness at Greenwich during the 
Years 1890-1904 {excluding 1901) for the Month of July, 

(Data from Gertrude E. Pearse (1928), Biometrika, 20A, 336.) 


Degrees of 
Clovuiuieas. 

Frequency, 

Degrees of 
Cloudiness. 

Frequency. 

10 

676 

4 

45 

9 

148 

3 

68 

8 

90 

2 

74 

7 1 

65 

1 

129 

6 I 

56 

0 

320 

5 1 

45 

Total 

1716 






CONTINUOUS VARIATES 11 

showing a number of days distributed according to degree of cloudiness. This is known 
as a U-shaped distribution. 

1 . 14 . Distributions also occur which in general appearance resemble sections of the 
types already mentioned. A J-shaped distribution, for example, resembles the ‘‘tail” 
of the symmetrical distribution of Table 1.7. The suicide data of Table 1.6 may be regarded 
as a symmetrical distribution truncated just below the maximum ordinate by the impossi¬ 
bility of the occurrenoo of negative values of the variate. This sort of conception is 
sometimes useful in fitting curves to observed data—a given analytical curve may fit the 
data quite well in a certain variate range, but may also extend into regions where the 
data cannot, so to speak, follow it. 

1 . 15 . The distributions considered up to this point have one thing in common— 
they have only one maximum or, in the case of the U-shaped curve, only one minimum. 
Distributions also occur showing several maxima, Tables 1.12 and 1.13 being instances in 
point. The first, showing a number of deaths according to age at death, is typical of 
death distributions. Near the start of the distribution there is a maximum and a rapid 
fall in the frequency ; there is an indication of another maximum about the age 20-25 ; and 
a pronounced maximum about the ago 70-75, the frequencies beyond that point tailing 
off to zero. It is natural to wonder whether such a distribution can be usefully considered 
as throe superposed distributions, a J-shaped distribution indicative of infantile mortality, 
a more or less symmetrical single-humped distribution with a maximum at 20-25, indicative 
of deaths at the adventurous age, and a skew distribution with a maximum at 70-75, the 
ordinary death curve of senescence. 


TABLE 1.12 

Showing the Number of Male Deaths in England and Wales for 1930-32, classified by Ages 

at Death, 

(Data from llegistrar-Genorars Statistical Review of England and Wales, 1933, text.) 


Age at Death 
(years). 

Numb('r of Deaths. 

Age at Death 
(years). 

Nnnibor of Deaths, 

0- 

97,290 

55- 

56,639 

5- 

11,532 

60- 

68,103 

10- 

7,305 

65- 

80,690 

15- 

13,062 

70- 

84,041 

20- 

]6,741 

75- 

72,180 

25- 

16,126 

80- 

45,094 

30- 

15,673 

85- 

19,913 

35- 

18,345 

90- 

5,145 

40- 

23,778 

95- 

767 

45- 

33,158 

100 and over 

48 

50- 

43,812 

1 

Total 

729,442 




12 


FREQUENCY-DISTRIBUTIONS 


TABLE 1.13 

Showing Number of Trypanosomes from Glossina morsitans classified according to Length 

in Microns. 

(From K. Pearson (1914-15), JSiome/ni’a, 10,112. Length presumably to nearest micron.) 


Length 

(microns). 

Frequency. 

Length 

(microns). 

Frequency. 

15 

7 

26 

110 

16 

31 

27 

127 

17 

148 

28 

133 

16 

230 

29 

113 

19 

326 

30 

96 

20 

252 

31 

64 

21 

237 

32 

44 

22 

184 

33 

11 

23 

143 

34 

7 

24 

116 

35 

2 

25 

130 

Total 

2500 


A similar dissection of a complex distribution could be undertaken for the data of 
Table 1.13, showing a number of trypanosomes from the tsetse fly, GJossina morsitansy 
classified according to length. We are led to suspect here that the distribution is composed 
of the addition of several others (and this, by the way, has led to a suggestion that the 
trypanosomes are a mixture of distinct types). 

Frequency Functions and Distribution Funciimis 

1 . 16 . The examples given above illustrate the remarkable fact that the majority 
of the frequency-distributions encountered in practice possess a high degree of regularity. 
The form of the frequency polygons and histograiUvS above suggests, almost iru^vitably, 
that our data are approximations to distributions which can be specified by smooth curves 
and simple mathematical expressions. This approach to the concept of the fiequency 
function, however, requires some care, particularly for continuous distributions. 

Consider in the first place a discontinuous distribution such as that of TaVjle 1.4. Let 
us represent our variate by x. Then we may say that x can take any of the ten values 
0, 1, ... 9 and that the frequency of x, say /(x), is given by the table, that is to say, 
J(0) = 1026,/(I) 1107,/(2) = 997, and so on. The frequency table, in fact, defines the 

frequency function. Furthermore, most of the frequencies in the table are approximately 
1,000, and we may then consider the observed distribution as approximating to that 
defined by 

f(x) = 1000, X = 0, 1, ... 9 . . , . (1.1) 

or, more generally, to the distribution 

f(x) == ir, a; = 0, 1, . . . 9 . . . . (1.2) 

This is perhaps the simplest case of a discontinuous frequency function, f(x) being 
a constant for all permissible values of x. 



FREQUENCY FUNCTIONS AND DISTRIBUTION FUNCTIONS 


13 


In Table L5 we havfe a discontinuous variate which can, theoretically, take an infinite 
number of values^ namely, any one of the positive integers. In practice, of course, there 
must be a limit to the number of stigmatic rays which a poppy can possess, but since we 
do not know that limit we may imagine our variate as infinite in range. The frequency 
function for the table itself is again simply defined by the frequencies therein; but if we 
wish to proceed to a conceptual generalisation of such a table wo must admit a discontinuous 
function f(oc) defined for all positive integral values of x. This occasions no difficulty 

provided that we are able to attach some meaning to the total fi-equency, i.e. that 
00 

/(*/) converges. 

)f«»l 

1.17. Consider now the case of a continuous variate. In the ordinary data of 
experience our distributions are invariably discontinuous, because our measurements can 
only attain a certain degree of accuracy. For instance, we are accustomed to suppose 
that the height of a man may in reality be any real number of inches in a certain range, 
say 50 to 80, such as 2():t. In fact, we can measure heights only to a certain accuracy, 
say to the nearest thousandth of an inch. Our measurements tlius consist of whole numbers 
(of thousandths) from 50,000 to 80,000, and such a number as 62,831*85 (= 20,000;r 
approximately) cannot appear. All physical measurements are subject to this limitation, 
but we accept it and nevertheless speak of our variables as continuous.” the under¬ 
lying supposition being that the measurements are approximations to numbers which can 
fall anywhere in the arithmetic continuum. 

1.18. With this understanding we can consider the distribution of grouped frequencies 
as leading to the concept of a frequency function for a continuous variate. If, in one 
of the distributions above, say that of Table 1.7, we were to subdivide the intervals, we 
should probably find tliat up to a point the resulting frequencies were smoother and smoother. 
The reader can verify the appearance of this effect for himself by grouping the data of 
Table J.7 in intervals of 8, 4, and 2 inches. We cannot, however, take the process too 
far, because, with a finite population, continued subdivision of the interval would sooner 
or later result in irregular frequencies, there being only a few members in each interval. 
But we may suppose that for ranges Ax, not too small, the distribution may be specified 
by a function f{x) Ax, expressing that in the range ± lAx centred at x the frequency is 
f(x) Ax, wherever x inay be in the permissible range of the variate. We may suppose further 
that as Ax tends to zero the population is perpetually replenished so as to prevent the 
occurrence of small and irregular fre(piencies ; and in this way wc arrive at the concept 
of the frequency function for a continuous variable. We write 

fjlF ~f(x)(lx ...... (1.3) 

expressing that the element of frequency (IF between x — Idx and x 4 - Idx is f{x) dx, for 
all X and for dx, however small. 

1.19. This admittedly somewhat intuitive approach to the concept of the continuous 
frequency-distribution appears to be the best for statistical purposes, and is certainly 
the way in which the concept was originally reached. In formulating the axioms and 
postulates of a rigorous mathematical theory, however, the mathematician considers a 
rather more general function. There is as yet no thorough formulation of the theory 
required in this connection, and it would be alien to the primary purpose of this book to 



14 FREQUENCY-DISTRIBUTIONS 

attempt one, even if the space were available. We will merely indicate in broad outline 
the general approach. 

1 . 20 . We consider a function F which is defined at every point in a continuous 
range and is continuous, except perhaps at a denumerable number of points. We require 
that F shall be zero at the lower point of the range (which may be — oo) and a constant N 
at the upper point (which may be + oo) and that it shall not decrease at any point. Such 
a function is called a Distribution Function. It corresponds to the cunjiulated frequency 
of a frequency-distribution, N being the total frequency; for example, in Table 1.4, 
F(x) ^ 0 for < 0, F(x) = 1026 for 0 < a; < 1, F{x) = 2133 {== 1026 + 1107) for 
1 < a: < 2, and so on. Here there are ten points of discontinuity for F(x), These points 
are called saltuses ” (jumps) and F{x) in this case is called a Step Function. 

If there is no saltus in the range, F(x) is continuous and monotonically increasing. 
If it possesses a derivative we have the equation in differentials 

dF = F\x) dx 

^f{x)dx .(1.4) 

corresponding to (1.3). f(x) is called the Frequency Function. The mathematics of this 
branch of the subject is then that of the study of functions of the class F{x) and /(a:). 

1 . 21 . The functions as thus defined are more general than those arrived at from the 
statistical approach in two ways : (i) F{x) can increase monotonically in part of the 
range and then possess a saltus, i.e. the frequency may be continuous for a time and then 
suddenly discontinuous—in statistical practice a variate is either continuous or discon¬ 
tinuous, never both in different parts of the range ; (ii) where no saltus exists F{x) can 
exist without there existing a frequency function, just as a continuous function need not 
necessarily, possess a derivative. In all the cases we shall consider, the existence of a 
continuous variate will be accompanied by the existence of a frequency function. 

The function F(x) is sometimes called a Probability Function, for reasons which will 
become evident in Chapter 7 when we consider the theory of probability. Essentially, 
however, ii has nothing to do with probability and W'e shall use the term distribution 
function ” only. 

1 . 22 . If the discontinuous frequency function is f(x), and F{x) is taken to be the 
total frequency less than or eqtcal to x, we have 

r 

.( 1 - 5 ) 

In the continuous case 



where the range is a to 6. We now introduce two conventions which simplify these expres¬ 
sions to some extent. We shall suppose, unless the contrary is specified, that in those 
mathematical expressions our frequencies are always expressed as proportions of the total 
frequencies, so that the total frequency is unity and the sum or integral over the whole 
range of the frequency function is also unity, i.e. F{b) = 1. Secondly, to avoid the 



STIELTJES INTEGRALS 


15 


oOQBtant specification of the limits a and b we may, without loss of generality, suppose that 
F(x) and f{x) are zero for any x less than a, and that F{x) = 1 and f{x) = 0 for any x 
greater than With this convention we may write 


and 


r 

^ M) 

F{X)^ 

OP 

^ M) = -F{- co) = 1 

> . 

f(x) dx — F{a:)) — F(— oo) — 1 



(1.7) 


( 1 . 8 ) 


Where it is necessary to take account of the total frequency N wo may do so by multiplying 
by N frequencies given by the frequency function. In our convention F(x) is always 
continuous on the left. 


Exairsv^ on Slielfjes Integrals 

1.23. The distinction bet^veen discontinuous and continno(>s distributions, though 
real and important for statistical jiurposes, is something of a nuisance in mathematical 
investigations, and to avoid the necessity of stating all our theorems twice we shall use 
a type of integral due to Stieltjes. In effect, this integral subsumes under one summatory 

process the finite summation denoted by 2' and the ordinary integral denoted by J. 

Suppose, in fact, that F{x) is a distribution function as we have defined it. Let ^^{x) 
be a continuous function in the range of F{x)y which we will take in the first instance to 
be finite, a to b. Divide the range into n intervals at points a — ^o, Xj, x^, , . . 

Xn = 6. Take f i in the range a to Xj, Sj range Xx to and so on. Let 

E - r(.^x)lF(xi) - F(a)} + r(.^){F(x,) F(Xx)} 

-f . . . +r(^^J{F(b) ^F(x,_,)} .... (1.9) 

It may bo shown that as the size of the intervals x^^i — Xj. tends to zero uniformly, 
E tends to a limit which is independent of the location of the points | or of the boundary 
points of the intervals. We then wTite this limit 

f tf^(x) (IF •••••* (1-10) 
J a 

and define it as the Stieltjes integral of ^(.r) with respect to F{x), 

As for the case of ordinary integrals, we may now consider a and 6 as tending to infinity 
and write, for example 

r 

J •—00 

provided that the limit exists. 

In particular, if y>{x) = 1, vre have the distribution function 



16 PREQUENCY-DISTRIBUTIONS 

1.24* If F(x) is the distribution function of a distribution possessing a continuous 
frequency function, the Stieltjes integral becomes the ordinary integral 


I ip{x)f{x)dx, 
J a 


and thus includes ordinary integration as a particular case. If F{x) is the distribution 
function of a discontinuous distribution, that is to say, is a step function, a term such as 
vanish unless there is a saltus in the range Xy to The sum S of 

(1.9) must then tend to the limit (since it does tend to a limit) Ey){x^) f{x^), i.e. to the 
ordinary summation of a series. The Stieltjes integral thus also includes such summation 
as a particular case. 

1.25. Many of the theorems of ordinary integration are true of the Stieltjes integral. 
We shall frequently require the following: 


j\dF <j''|v|dP . 

< m\'' dF 

J a 


< M 


. (l.ll) 


. ( 1 . 12 ) 


where M is the upper bound of ip{x) in the range («, 6). 


rb rh 

1 ^ djP = j (IF 

J a J a 


where | is a value of x in the range (a, b). 
If a and 6' are finite 


2fM)dF ^ 2^1" f,(x)dF, 

J« 7 rt frlJo 


provided that £fj(x) converges uniformly in the range. The theorem is not necessarily 
true if a and 6, or one of them, are infinite. 

The ordinary rules of partial integration are also applicable to Stieltjes integrals. 


Variate Transformatims 

1.26. Sujipose we i^ave a new variate f related to x by some functional equation 

x=^x(i), .( 1 . 15 ) 

f being continuous and differentiable in x throughout the range of x, and vice-versa. We 
have then the eqxiation in differentials 

.( 1 . 10 ) 

Corisequeutly, for a continuous distribution 


. ( 1 . 10 ) 


r(x) = r /(x), 

J — CO J —» 



VARIATE TRANSFORMATIONS 


17 


and consequently we may write the distribution as 

dF 


. (1.17) 

dx 


expressing that an element of frequency between ^ and .f + is f{x(S) } jr. 

(IQ 

The equation determining the frequency function may then l>e transformed as if it were 
an equation in differentials. Such transformations are important m the theory of con¬ 
tinuous distributions. By their means many mathematically specified distributions may 
be reduced to known forms, either exactly or approximately. 

For example, a distribution which we shall have to study in the theory of sampling is 


dF = - 
2 'i 


-e zf ^dx 


-'40 

It is readily verified by integration that F{(X)) = 1. 

y2 

By the transformation ~ = t we reduce this to 

1 


0 < < 00 . 


dF = 




dt 


< I C, 


a well-known form in analysis, the distribution function being the incomplete JT-function 

Jo n/A 




Again, the distribution 
dF = 


Vo 




dt 


— 00 < t < CO 


(yo being chosen so that F{ 00 ) = 1), a symmetrical peaked distribution of infinite range 
rather like that of Fig. 1.3, may, by the substitution of t = ^vtanO, be transformed 
into 

jrt yoyvBec^OdO ^ ^ ^ ^ 

dJi = _ ——- — 


sec’ ^ ^ 0 
= VoV^' coh'~^ 0 dO, 


- < 0 < 

2 2 


a distribution of finite range - ^ + 2 ’ symmetrical. Putting now sin 0 = f, 

we have 

dF == - 1 < < 1, 

and again f * = «, 

V- 1 

dF = y^V'^O- — x) ^ x~^dx 0 <! j; < 1, 

A.8.—VOL I ® 



18 FREQUENCY-DISTRIBUTIONS 

The effect on the range of this last substitution is to be noted, f ranges from — 1 to -f 1, 
and as it does so x ranges from + 1 to 0 and back to + 1. The distribution function of 
the *-distribution, F{x) from 0 to x, is thtis that of the f-distribution from — f* to f*. 
Whenever substitutions are made under which there is not a (1,1) continuous relation 
between the variates, points such as this require some watching. 


1.27. There is one variate transformation which is worth special attention. In 
the distribution 

dF =f{x)dx 


put 

Then 


f = f f{^) 

J —oo 

fljT 

dF^fix)^^dS 


/(•«) 


di 


0 < S <1 . 


. (1.18) 


80 that the distribution is transformed into the very simple '' rectangular ” forn) in which 
all values of the variate from 0 to 1 are equally frequent. Any continuous distribution 
can be transformed into the rectangular form ; and it follows that there exists at least 
one transformation which will transform any continuous frequency-distribution into any 
other continuous frequency-distribution, viz. the transformation which transforms one 
into the rectangular form coupled with the reverse of that which transforms th(‘ other into 
the rectangular form. 


The Genesis of Frequency-Distrihntions 

1.28, Up to this point we have not inquired into the origin of the various observed 
frequency-distributioiLs which have been adduced in illustration. Certain of them may 
be considered apart from any question of origination from a larger population. The death 
distribution of Table 1.12 is an example ; if wo are interested only in the distribution of 
male deaths in England and Wales in 1930-32 the whole of the population under con- 
Bideration is before us. 

But in the gi’oat majority of cases the po])u1ation which we are able to examine is only 
part of a larger population on w^hich our main interest is centred. The height distribution 
of Table 1.7 is only a part of the population of men in the United Kingdom living at the 
time of the inquiry, and it is mainly of importance in the light of the information which 
it gives us about that population. Similarly the distribution of farms of Table 1.9 is largely 
of interest in the information it gives about costs of milk production for the whole country. 

1.29. In the two cases just mentioned, height and costs of milk production, we 
have information about a certain sample of individuals chosen from an existing population. 
Only lack of time and opportunity prevents us from examining the whole population. 
It .sometimes happens, however, that we have data which do not emanate from a finite 
existent population in this way. Table 1.14 is an example. It shows the distribution of 
throws with dice. 



MULTIVARIATE DISTRIBUTIONS 


19 


TABLE 1.14 

Showing the Number of esses (throws of 4, 5 or 6) with Throws of 12 Dice, 
(Weldon’s data, cited by F. Y. Edgeworth, Encychpsedia Britannica, 11th ed., 22, 39.) 


Number of 
Successes. 

Frequency. 

Niunber of 
Successes. 

Frequency. 

0 

0 

7 

847 

1 

7 

8 

536 

2 1 

()0 

9 

257 

3 1 

19H 

10 

71 

4 

430 

11 

ll 

6 1 

731 

12 

0 

6 

948 





Total 

409ti 


Now it is clear that, in a sense, wo have not in these data got a complete population, 
for we can add to them by furtlier casting of the dice. Bui these further throws do not 
exist in the sense that the unexamiried men of the United Kingdom or the unexamined 
dairy farms of England and Wales (‘xist. They have a kind of hypothetical existence con¬ 
ferred on them by our notion of the throwing of the dice. 

Even distributions which appear at first sight to be existent may be considered in 
this light. The trypanosome distribution of Table 1.13, for instance, was obtained from 
certain tsetse flies. Wo may consider it as a .sample of all the tsetse flies in existence, 
whether harbouring trypanosomes or not —an existent population ; hut we may also 
consider it as a sample of what the distribution would be if all the tsetse flies were infected 
with trypanosomes—a hypothetical population. 

The population conceived of as 2 )arental to an observed distribution is fundamental 
to statistical inference. We shall take up this matter again in later chapters when we 
consider the sam})ling problem. The point is mentioned here because it will occasionally 
arise before we reach that chapter. It must be emphasised that the distincjtion between 
exishmt and liypotlu'tical universes is not nu’srely a matter of ontological speculation—if 
it were we could safely ignore it—but one of f)ractical importance when inferences are 
drawn about a population from a sample generated from it. 


31 yItivariafe Distribvtions 

1.30. In the foregoing sections we have considered the members of a population 
according to a single variate, and the frecpiency-distributions may thus be called univariate. 
The work may be readily generalised to inehide populations of members considered accord¬ 
ing to two or more variates, yielding bivariate, trivariate . . . multivariate frequency 
distributions. Table 1.15, for example, shows the distribution of a number of beans 
according to both length and breadth. The border frequencies show the univariate^ dis¬ 
tributions of the beans according to length and breadth separately, and the body of the 
table shows how the tw^o qualities vary together. 




Breadth in millimetre (central values). 


20 


FREQUENCY-DISTRIBUTIONS 

TABLE 1.16 


Shcywing Frequencies of Beans mth specified, Lengths and Breadths. 
(Joharmsen’s data, cited by S. J. Pretorius (1930), Biometrika, 22, 110.) 

Lengths in millimetres (central values). 


— 

17 

16-5 

16 

15-5 

16 

14-5 

14 

13-5 

13 

12-6 

12 

11-5 

11 

10-5 

10 

9-5 

Totals. 

0-125 


2 



3 








_ 

_ 

_ 


6 

8-B75 

4 

8 

17 

19 

— 

— 

— 

— 


_ 

— 

— 

— 

— 

— 

— 

48 

8-(i25 

2 

23 

101 

166 

93 

23 

o 


.— 

— 

— 

— 


— 

— 

— 

400 

8-576 

— 

18 

106 

494 

674 

227 

66 

9 

— 

— 

— 

— 

— 

— 

— 

— 

1483 

8-126 

— 

4 

44 

375 

956 

913 

362 

73 

12 

3 

— 

— 

— 

_ 

— 

— 

2742 

7-876 

— 

— 

7 

81 

385 

871 

794 

330 

89 

19 

3 

— 

— 

— 

— 


2579 

7-625 

— 

— 

1 

4 

65 

236 

469 

361 

175 

65 

27 

4 

— ’ 

— 

1 — 

— 

1397 

7-375 

— 

— 

— 

— 

6 

23 

91 1 

137 1 

124 1 

78 

37 

22 

11 

— 

1 

— 

530 

7*125 

— 

— 

— 

— ' 

— 

1 

13 

18 

28 : 

35 

25 

32 

n 

6 

1 

— 

170 

6-876 


— 1 

— 

— 


— 

— 

1 

9 ! 

8 

21 

12 

13 

7 

1 

— 

72 

6-625 

— 

— 1 


— 

— 

-- 

— 

— 

— 

— 

2 

— 

1 

4 

3 

— 

10 

6-376 


— 1 

— 

— 

- 

— 

— 

— 

— 

1 

— 

— 

— 

1 

1 

1 

4 

Totals 

1 

0 

65 

275 

1129 

2082 

2294 

1787 

929 

1 

437 

j 199 

115 

70 

36 

1 

18 

7 

1 

9440 


As for the univariate case, the variates may be discontinuous or continuous and we some* 
times meet cases in which one variate is of one kind and one of the other. 


1.31. In generalisation of the frequency polygon and the histogram we may construct 
3-dimensional figures to represent the bivariate distribution. Imagine a horizontal plane 
containing a pair of perpendicular axes and ruled like a chessboard into cells, the ruled 
lines being drawn at points corre8i>onding to the terminal points of class-intervals. At 




INDEPENDENCE 


21 


the centre of each interval we erect a vertical line proportional in length to the frequency 
in that interval. The summits of these verticals are joined, each to the four summits of 
verticals in the neighbouring cells possessing the same values of one or the other variate. 
The resulting figure is the bivariate frequency polygon or Stereogram. 

Similarly we may erect on each cell a pillar proportional in volume to the frequency 
in that cell and thus obtain a bivariate histogram. Fig. 1.4 shows such a figure for the 
bean data of Table 1.15, 

1.32. We may write the bivariate distribution with variates rj, as 

(IF ^ f{x\, x,^)dxy^<lx 2 .(1.19) 

With the usual conventions W'C shall then have for the bivariate distribution function 

F(ri, x^) j I /(.Ti, x^)dx^dx 2 .... (1.20) 

J —00 J --QO 

this integral also being understood in the Stieltjcs sense, reducing to ordinary integration 
if /(iTi, X 2 ) is continuous and to ordinary summation if it is discontinuous. 

hidependence 

1.33. If there are two distribution functions Fj, such ^hat 

3 * 2 ) Fi(,ri) F,(r2) . ... (1.21) 

then Xi and x^ are said to be independent. Where fre(|ueney functions exist we have 

X, ro To 

*^ 2 ) dx^ dxj, = /i(:r,) (h\ f^{x^) dx^ 

— ocj-tio J CO J —00 

giving 

f{Xi, 0 * 2 ) ~ fiiXf) (1.22) 

It is readily seen that this definition of statistical indcqiendence conforms to the colloquial 
us(‘ of the word and also to its matliematical use. The distribution of X 2 for any fixed Xi 
(e.g. the distribution in a row or column of the bivariate frequency table) is the same what¬ 
ever the fixed value of Xu ^hat is to say, the distribution of is independent of .I’l. 

Two variates which are not independent are said to be dependent. Evidently those 
of Table 1.15 are dependent, for the distributions in rows or in columns are far from similar. 
(Generally, 71 variates are indejiendent if 

F{x, . . . xj - Fj(x,) . . . F„(tJ. 

1.34. Transformations of the variate for hi- and multivariate distributions follow 
the ordinary laws for the transformation of differentials. For example, if 

(IF -- f(xi, X 2 ) dxi (1x2 

^ ^2) .^2 = .1*‘^■2) 

we have dF^f{xi(£i, fa), X 2 (fi, fa)]^ • • • (1-23) 

where J is the Jacobian 


J = I 

I a(fi, i.) 


and is to be taken with a positive sign in (1.23), 


dxx 

0X2 



dXy 

0 X, 


M 1 



FREQUENCY-DISTRIBUTIONS 


Consider, for example, the distribution 


dF Zfi exp 




2pa;,a;, ^ 

OjiCTjj 


:)} 


^dxxdx% 


— go < rci, a?a < 00 . (1.24) 


^o, as usual, being chosen so that the total frequency is unity. The variates are evidently 
dependent. 

Oy 0*2 


Put 


We have 




O'* 


3(^1, l a) 


1 

0^1 


<^2 


|0 (l-p=)4. 


= (L“ 

ai(7, 


and 


a: 


•f ^ ri _ ^ 


0*1 a* 


The distribution then becomes 


" = f- 

" . 

(1 - />«)* 

The transformed variates li and are thus independent. 


(1.2.5) 

(1.26) 


NOTISS AND REFERENCES 

The collection of definitions of statistics by Willcox (1935) has already been referred 
to in the text. 

Examples of practical frequency-distribution.s will be found in moat statistical journals, 
particularly Biomeirika, 

As to the mathematical basis of the theory of frequency-distributions, there ap})ears 
to be no account in English, The reader who is interested should, however, make a point 
of reading two French works, that by Levy and those by Frechet in the Borel Traite. Both 
these are written from the standpoint of the theory of probability, but the basic ideas of 
the theory of frequency-distributions are the same whether probability is concerned or not. 

Borel, E., Tmite dn Calcul di's Prohabilites, (Gauthier-Villars, Paris. A series of brochures 
written under the general editorship of M. Borel. Sec particularly the two by 
M. Fre^chet called “ Nouveaux Recherches.” 

Levy, P., Calcul des Probahilith, Gauthier-Villars, Paris. 

Shohat, J. (1929), “ Stieltjes Integrals in Mathematical Statistics,’’ Ann. Maih. Statist 
1, 73. 

Willcox, W. F. (1935), Definitions of Statistics,” Revue de VInsL Int, de Skitistique, 3, 388. 



EXERCISES 


23 


EXERCISES 

1,1. Draw frequency polygons or liistograms of the following distributions*— 

TABLE 1.16 

Frequency-Distribution of ^Successes in Ttoelve Dice ihroum 4096 Times ^ a Throw of 6 Points 

reckoned as a Success. 

(Weldon’s data, loc. cit., Table 1 14.) 


Number of Successes . 

0 

1 

2 

3 

4 

5 

6 

7 and over 

Total, i 

Number of Hirows 

j 

447 

_ 

iJ4r 

1181 

7% 

380 

115 1 

24 i 

1 

8 

4090 


TABLE 1.17 

Frequency-Di^frihufion of Size of Films hi the Food, Drink and Tobacco Trades of Great 

Britain. 

(Final Report of the Fourth Census of Pioduction, 1930, Part Til. The table shows the 
number of firms employing, on an aveiage, certain numbers of persons.) 


Si/n of Filin (Aver- 

11-24 

1 ( 

25-49^50 99 100- 

200- 

300- 

400- 500- 

760 

1000-^ 

1500 

Total. 

Niurilnns Fin 
})Io\nd) 

1 


199 

.. 

299 

399 

499 749 

999 

1499 

and over 


Nmnb(U of Firms 

1 22451 

U49 771 

1 

439 

104 

1 

76 1 

! 

3h 54 

i 

31 

1 

23 1 

1 1 

29 

6310 


TABLE 1 18 

Frequency-Distribution of Plot^ according to Yield of Grain in Pounds poin Plots of jlJh 

Acre in a Wheat Field 

(Mercer and Hall (1911), Jom Agr. Science, 4, 107.) 


Yield of Gram m pounds 
per Aero. (Omtial 

value of ran^e). 

2 8 

3 0 

32 

34 

3 0 

3 8 

4 0 

4 2 

4 4 

4 o’ 

48 

5 0 

5 2 

Total 

Number of Plots 

' 1 

16 

20 { 

1 

47 

03 

78 

88 , 

69 

_1 

59 

35 

10 

i 

6 

4 

500 



24 


FREQUENCY-DISTRIBUTIONS 


TABLE 1.19 


The Percentages of Deaf-mutes among Children of Parents One of whom at least was a Deaf^ 
mute, for Marriages producing Five Children or More, 


(Compiled by G, Udny Yule from material in Marriages of the Deaf in Americay ed. E: A. 
Pay, Volta Bureau, Washington, 1898. Where a family fell on the border line between 
two class-intervals one-half was assigned to each.) 


Percentage of 
Deaf-mut^s. 

Number of 
Families. 

Percentage of 
Deaf-mutes. 

Ntunber of 
Farnilios. 

0-20 

220 

00-80 

5-5 

20-40 

20-5 

80-100 

15 

40-60 

12 

Totai. 

273 


TABLE 1.20 


Showing the Frequency-Distribution of Fecundity, i,e, the Patio of the Number of Yearling 
Foals produced to the Number of Coverings, for Brood-mares {Racehorses) covered Eight Times 

at least, 

(Pearson, Leo and Moore (1899), Phil. Trans., A, 192, 303. Where a case fell on the border 
between two intervals, one-half was assigiu'd to each.) 


Fecundity. 

Number of Mares 
with Fectmdity 
between the 
Given Limits. 

Fecundity. 

1 

Number of Mares 
with Fecundity 
between the 
Given Limits, 

1/30- 3/30 

2 

17/30-19/30 

315 

3/30- 5/30 

7-5 

19/30-21/30 

337 

5/30- 7/30 

H'5 

21/30-23/30 

293-5 

7/30- 9/30 

21-5 

23/30-25/30 

i 204 

9/30-11/30 

56 

25/30-27/30 

127 

11/30-13/30 

104-6 

27/30-29/30 

49 

13/30-15/30 

182 

29/30-1 

19 

15/30-17/30 

271-6 

Total 

2000-0 



EXERCISES 


25 


TABLE 1.21 

Showing Numbers of Sentences of given Lengths in Passages from MacavJay's Essays on 

Bacon and on Chatham. 

(From 0. Udny Yule (1939), Biometrika, 30, 363.) 


Length of Sentence 
in Words. 

Number of 
Sentences. 

LengtJi of Sentence 
m Words. 

Number of 
Sentences. 

1-5 

46 

66- 

2 

6- 

204 

71- 

4 

11- 

252 

76- 

8 

16- 

200 

81- 

2 

21- 

186 

86- 

2 

26- 

108 

91- 

1 

31- 

61 

96- 

1 2 

36- 

68 

101- 

1 

41- 

38 

106- 

— 

46- 

24 

ni¬ 

1 

51- 

20 

ne 

— 

56- 

61- 

1 

12 

g 

121 

1 

1 

I 

Totai. 

1251 


TABLE 1.22 

Showing the Numbers of Old Egyptian Skulls with Specified Lengths of Ike Left Occipital 

Bone in millimetres. 

(From T. L. Woo (1930), Biometrika, 22, 324.) 


Length 

(central values). 

Frequency. 

Length 

(central values). 

Frequency. 

84*5 

12 

102 5 

74 

86 5 

12 

104 5 

68 

88 5 

32 

106-6 

36 

90*6 

48 

108-5 

18 

92-6 

79 

110-5 

7 

94-5 

116 

112 5 

4 

96 5 

104 

114-5 

4 

98-5 

126 

1165 

— 

100-5 

123 

118 5 

1 

1 


Totai. 

864 




20 


FREQUENCY^DISTEIBUTIONS 


TABLE 1.23 

Showing the Number of Women Aborting at Specified Term in Weeks., 
(From T. Pearce (1930), Biometrika, 22« 260.) 


Tonn 

(weeks). 

Frequency. 

Term 

(weeks). 

Frequency. 

4 

3 

17 

13 

5 

7 

18 

14 

« 

10 

19 

8 

7 

13 

20 

4 

8 

14 

21 

2 

9 

29 

22 

10 

10 

22 

23 

4 

n 

21 

24 

4 

12 

18 

25 

3 

13 

28 

26 

4 

14 

16 { 

27 

6 

15 

16 

19 

10 

1 

28 

1 

Total 

283 


1.2. Sketch the following curves and compaii? their shapes with those of the 
distributions in the previous exercise :— 


1 

II 

! 

8 

8 

!/ 

1 < X ^ 00 

jy = 2/oJ- 

y > 1, 0 ' .i’ 00 


(1 + 


y 

^ ?/„(! - zrx» 

a, b> 

1, 0 

.r < 1 

y 

= ytfi. ^x'' 

y 1. 

0 . X 

« 00 

y 

= «/o(l - a'®)" 

a - 0 , 

- ] 

X 1. 


1.3. Show that the following distributions can all be transformed into the type 


dF -:?/o(l Ux 

and find the transformations : 

dF - ro(l -- r^) dr 


0 < X < 1 

- 1 r < 1 



— 00 < ^ < CX) 


dF =-— 00 < e < 00 

(AU these distributions are important in statistical theory. The distribution to which 
they are reduced is called the Type I or B-distribution.) 



(2) Yield of Milk per Week (Gallons) (Central Value of Interval ) 


EXERCISES 27 

1 . 4 . Sketch the stereograms or bivariate histograms of the following distributions: 

TABLE 1.24 


Number of Families deficient in Roam Space in 95 croiried London Wards* 
(Census of 1931, Housing Report, p. xxxii.) 



Btaiidaid Room Kequiieinoiit (Rooms). 

Families doficient by 

2 

3 

4 

6 

0 

7 

8 

Totals. 

1 room 

12,999 

]«,19H 

7,724 

2,170 

164 

19 


41,274 

2 rooms 

• 

3,054 

4,479 

1,448 

221 

15 

1 

i 9,218 

3 rooms 

• 

•• 

310 

508 

100 

4 

1 

929 

4 looms 



1 

1 

! 

4 


1 36 

Totals 

12,999 

21,252 

j 12,513 

4,130 

j 512 

42 

2 i 

61,466 


TABLE 1.25 

Number of Cows Distributed according U) (1) Age in Years and (2) Yield of Milk per Week 

in 4912 Ayrshire Cow^. 

(Data from J. F. Tocher (J928), fhonKtnka, 20B, 106.) 


(J) in Y^*mH 



3 

4 

5 

(> 

7 

8 

9 

10 

11 

12 

13 

14 

15 

10 

17 

18 

3 OTALS 

8 

_ 

_ 

_ 


_ 

— 



1 

- 

— 

— 

__ 




1 

9 

_ 

2 

o 

— 

1 


— 

— 

— 

— 

— 

— 

— 


— 

— 

5 

10 

3 

5 

1 

1 

3 

— 


— 

— 

— 

— 1 

- 

— 

— 

—• 

— 

13 

11 

2 

10 

8 

7 

1 

— 

1 


2 

1 

- 

1 

— 


- 

- 

33 

12 

2 

25 

17 

9 

5 

4 

4 

2 

1 

1 

— 1 


— 

1 



71 

13 

<) 

70 

29 

18 

9 

2 

4 

1 

1 

1 


1 

— 


— 

— 

151 

U 

11 

70 

57 

38 

23 

'» 

7 


1 

2 

3 1 

— 


- 

__ 

— 

230 

15 

11 

115 

79 

43 

34 

24 

n 

8 

4 

5 

1 1 

2 

1 

- 

1 

- 

339 

10 

15 

149 

119 

74 

59 

23 

23 

10 

9 

7 

4 1 

— 

— 

— 

1 


199 

17 

1() 

148 

131 

94 

5S 

34 

32 

15 

12 

« 


— 

1 



- 

552 

18 

11 

140 

132 

83 

73 

49 

39 

22 

17 

( 

5 

1 

1 


— 


585 

19 

10 

117 

112 

113 

87 

51 

35 

33 

11 

10 

2 1 

3 

1 


~ 

1 

580 

20 

8 

07 

107 

79 

()9 

51 

25 

30 

13 

19 

:i 1 

3 

— 

— 

1 


490 

21 

3 

t»3 

93 

88 

70 

49 

31 

29 

9 

7 

^ 1 


1 

— 

1 


448 

22 

5 

42 

03 

49 

45 ' 

32 

14 

1 

10 

3 

1 

2 

— 


— 

— 1 

284 

23 

1 

19 

33 

38 

1 38 

27 

17 

17 

12 

7 

1 

2 

2 

— 

- 

j — 

214 

24 

2 

20 

23 

34 

27 

19 

i:i 

i 9 

i 

2 

• 1 ' 

— 

— 


— 

I — 

153 

25 

3 

10 

15 1 

22 

17 

20 

8 

1 10 ' 

' 3 

4 

— 1 

““ 

— 

— 


— 

1 112 

20 


1 7 

n ' 

7 

1 4 

15 1 

1 2 

4 

2 

1 ^ 

1 

- 

— 

— j 


— 

1 58 

27 

_ 

2 

7 

9 

6 

5 

4 

*> 

— 


- 


— 

1 

- 

1 — 

i 35 

28 

_ j 

1 _ 

2 

1 

4 

2 1 

1 

1 

2 

— 

— 

— 

- 

— 


1 — 

13 

29 

__ 

_ 

2 

2 

' 4 

1 

1 ^ i 

— 

3 

1 -- 

1 > 1 

— 

— 

- 

— 

— 

15 

30 

— 

1 

— 

— 

— 

o ' 

1 1 
1 2l 

1 _ 

* 2 



— 

— 

— 

— 

— 

4 

31 

— 

— 

2 

1 

1 — 

j 

— 


1 — 

1 — 1 

— 

— 

1 — 

—, 

— 

5 

32 

— 

— 

— 1 

1 ^ 

— 


I-1 

— 

j 

— 

i i 

— 

— 

1 

—‘ 


1 

33 

_ 

— 

1 

— 

— 

— 


1 — 

1 

1 — 

— 

— 

1 

— 

- 

34 


— 

— 

— 


1 

_ - 1 

1 — 


— 


—, 


-- 

1 “ 


Toialr I 

1 ; 

1129 

1 1047 

812 

1 03(> 

419 

' 270 

223 J 

122 

75 

1 ,32 

15 

7 


4 

1 ' 

1 4912 






28 


FREQXJENCY-DISTRIBUnONS 


TABLE 1.26 


DMnbuiion of Weekly Retvms according to (1) Call Discount Rates and (2) Percentage of 
Reserves on Deposits in New York Associated Banks. 

(From Statistical Studies in the New York Money Market, by J. P. Norton. Publications 
of the Department of the Social Sciences, Yale University; The Macmillan Co., 1902.) Note 
that, after the column headed 8 per cent., blank columns have been omitted to save space. 

(1) Call Discount Rates. 



CM 



L!_ 

jj.s 

' 2 

j25 

/ 3 

3 5 

4 

4 5 

1 

5 5 

L 

f 6 

6 5 

1 ^ 

7 5 

/ s 

/ 21 

22 

23 

24 

25 

L- 

Mill 

— 


i- 

1 

2 

1 

6 

/ J 

1 

4 

l£ 

4 

Z/ 

2 

11 1 

1 

1- 

2 

- 

: 

1 

n 

3 

6 

26 

— 

— 

- 


2 

6 

13 

12 

16 

6 

11 1 

4 

7 


— 

27 

— 

1 

10 

9 

14 

12 

16 

17 

19 

9 

^ 1 

3 

4 

_ 

1 

28 

— 

5 

30 

23 

20 

11 

7 

3 

7 

1 

2 , 

2 

3 

_ 

— 

2ft 

3 

9 

48 

17 

16 

3 

6 

3 

1 

— 

_ 1 

_ 



_ 

30 

1 

12 

12 

10 

S 

4 

4 


2 

— 



— 


_ 

31 

8 

10 

6 

2 

4 

2 

2 

1 

1 

— 

- 

— 

— 


_ 

32 

15 

14 

10 

8 

5 

— 


1 

__ 

— 

— 


_ 


_ 

:u 

16 

8 

4 

1 

— 

1 

2 

1 


— 

- 

_ 

_ 


_ 

34 

1 2 1 

11 

1 

— 




— 

— 

— 

_ 


_ 

_ 

_ 

35 

8 ! 

5 

1 

— 


— 

1 — 


— 



_ 

_ 

_ 


36 

7 : 

2 

1 

— 

1 _ 


— 

— 

.... 



_ 


( - 


37 

8 

— 

I 

— 

1 

1 — 

— 


— 


— 

— 


_ 


38 

9 

1 

1 

— 

^ _ 


1 — 

— 



_ 

_ 

1 

_ 


3ft 

19 

2 

— 

— 


— 

1 

1 - 

_ 

— 




_ 


40 

7 

8 


' _ 

— 


— 

— 


— 

- 

_ 


1 _ 

_ 

41 

42 ! 

43 

44 

45 

7 

8 

1 

1 

2 

3 

2 

— 

1 - 

i 3 

— 

1 

- 

—* 

! _ 

1 - 

— 


— 

1 - 

ir 

Totals 

121 

1 93 

12'> 

* 70 

(19 

1 40 

1 52 

15 

62 

20 

35 

10 1 

18 


10 


9 If) j 12 /15 fuo 1 25 Totals^ 


1 

— 

F 

F 


— 

2 

1 


2 


— 


1 1 

1 

9 

1 

2 

1 

1 

— 

— 

42 

2 

2 

— 

1 

1 

2| 

85 

_ 

— 

— 

1 



124 

_ 

1 

_ 



— 

115 


“ 



— 

1 

109 

63 

36 


- 

— 



— ^ 



— 



— 

53 



_ 



— 

32 

— 

— 



- 


14 






— 

14 

10 




— 



11 

— 

1 — 

— 

— 

— 


21 

— 

1 


- 


— 

15 


— 

— 



— 

10 


—- 

j_ 


- 

— 

10 

1 



1 

! 

- 

~ 

1 

2 

4 

7 

1 


1 

1 

780 


1.5. Show that the conditions that the function 


j{xu Xt) = 2 oe\p {Ax\ + 2HxiXi — oo < Xi, a:, < oo 

may represent a frequency function are 


(а) A < 0 

(б) B < 0 

(c) AB - > 0. 


Show further that if these conditions are satisfied and the integral of/(Xj, x,) between 
— CX3 and oo for both variates is unity, then 




H 

-B 


\i 






CHAPTER 2 

MEASURES OF LOCATION AND DISPERSION 


2.1* It has been seen in Chapter 1 that the frequency-distributions occurring in 
statistical practice vary considerably in general nature. Some are finite in range and 
some are not. Some are sjnnraetrical and some markedly skew. Some present only 
a single maximum and others present several. Amid tliis variety we maj^ however, 
discern four general types : (a) the symmetrical distribution with a single maximum, 
such as that of Table 1.7 ; (/j) the asymmetrical distribution, or skew distribution, with 
a single maximum, such as those of Tables 1.8 and 1.9 ; (c) the extremely skew, or J-shaped, 
distribution, such as that of Table 1.2; and (d) the U-shaped distribution, such as that 
of Table 1.11. To make this classification comprehensive we should have to add a fifth 
class comprising the miscellaneous distributions not falling into the other four. 

The distributions with a single maximum will hereafter be called “ unimodal.*’ The 
synonymous terms ‘‘ cocked-hat,” “ single-humped ” and one or two others also occur in 
statistical literature. 

2.2. It frequently happens in statistical work that w^e heve to compare two distri¬ 
butions. If one is unimodal and the other J-shaped or multimodal a concise comparison 
is clearly difficult to make, and in such a case it would probably be necessary to specify 
both distributions completely. But if both are of the same type (and it is in such cases 
that comparisons most frequently arise) we may be able to make a satisfactory comparison 
merely by examining their principal characteristics ; e.g. if both are unimodal it might 
be sufficient to compare (a) the whereabouts of some central value, such as the maximum 
—this, as it were, locates the distributions; (6) the degree of scatter about this value 
—the dispersion ; and (c) the extent to which the distributions deviate from the sym¬ 
metrical—the skewness. 

The same point emerges when our distributions are specified by someNnathematical 
function. If, for example, we have two distributions of the type 

(t-m)* 

dF = yo e 2r dxy 

symmetrical about = m, a complete comparison can be made by comparing the value 
of the constants m and v in the distributions. Such constants are called '^xnameferfi of 
the distribution. This chapter is devoted to a discussion of j>arameter8 of location and 
dispersion. 

Measures of LocMim: the Arithmetic Mean 

2.3. There are three groups of measures of location in common use : the moans (arith¬ 
metic, geometric and harmonic), the median and the mode. We consider them in turn. 

The arithmetic mean is perhaps the most generally used statistical measure, and in 
fact is far older than the science of statistics itself. If the proportional frequency of the 
values a; of a distribution isf{x)j the arithmetic mean about the point x ^ ais defined by 



30 


MEASURES OF LOCATION 


This integral is te be understood in the Stieltjes sense and hence includes summation in 
the discontinuous case ; e.g. the arithmetic mean of a set of discrete values x is their sum 
divided by the number of values. In formula (2.1) the frequency, in accordance with 
our usual convention, is expressed as a proportion of the total frequency. If the oc«ttai 
frequencies are 5 r(a:), totalling N, we have 

, 1 f“ 

(,x-a)g(x)dx 

in the continuous case, and 

iV Jao —OO 

in the discontinuous case* The value of the arithmetic mean thus depends on the value 
of a, the point from which it is measured. For a mathematically specified distribution 
the integral (2.1) need not necessarily converge, in which case no arithmetic mean exists. 


2*4. The calculation of the arithmetic mean of a numerically specified distribution 
(i.e. one whose frequency-distribution is given in the form of a numerical table like those 
of Cbapt/Cr 1) is a simple process. If there are relatively few values in the i>opulation 
we merely sum them and divide by their total number N, If they are given in the form 
of a frequency table a more formal procedure is desirable, but the principle is exactly the 
same. The following example will make the process clear* 


Example 2.1 

To calculate the arithmetic mean of the population of males distribuiod according to 
height of Table 1.7. 

Let us note first of all that if b is some other arbitrary variate-value, 

(about a) = f {x — a) dF 


pew 

(x ~ h) (IF 4 {b - a) iIF 
J — 00 J — 0(J 

jn'i (about b) + b — a 


. ( 2 . 2 ) 


In other words, we can find the mean about any point very sim]jly when we know 
the mean about any other. In calculating the arithmetic mean we can then take an 
arbitrary point as origin and transfer to any other desired point afterwards. It is generally 
convenient to choose this arbitrary point somewhere near the maximum frequency. 

One further point arises in grouped data such as these. We do not know exactly the 
variate-values of the individuals within a certain class range. We therefore assume them 
concentrated at the centre of the interval. Corrections for any distortion thus introdiu^ed 
wiU be considered in Chapter 3. In fact, no correction is required for the arithmetic 
mean in the case when the frequency “tails off ” at both ends of the distribution. 

In the particular case before us we take an arbitrary origin at tlie centre of the interval 
67- inches, i.e. at the point 67inches, and measure .r ~ a) from that point. Column 2 
in Table 2.1 shows the frequency, column 3 the value of f and columu 4 the value of |/. 
We find, having due regard to sign. 


v(f/) 8763 - 8584 = 179. 


Hence the mean about a: = 0 is 67^ -f 


179 

8585 


67-46 inches* 



THE ARITHMETIC MEAN 


31 


TABLE 2.1 

Culculation of the Arithmetic Mean for the Distribution of Table 1,7^ 


(1) 

Height, 

inches. 

(2) 

Frequency 

/• 

(3) 

Dov'iation 
from Arbitrary 
Value 

I. 

(4) 

Product 

if- 

57- 

2 

- 10 

20 

58- 

4 

- 9 

36 

6&- 

14 

- 8 

112 

60- 

41 

- 7 

287 

61- 

83 

- 6 

498 

62- 

169 

~ 5 

845 

63- 

394 

- 4 

1576 

64- 

669 

- 3 

2007 

65- 

990 

— 2 

1980 

66- 

1223 

- 1 

1223 

67- 

1329 

0 

- 8584 

68- 

1230 

4 1 

1230 

69- 

1063 

4' 2 

2126 

70- 

646 : 

+ 3 

1938 

71- 

392 1 

4- 4 

1668 

72- 

202 

4 5 

1010 

73- 

79 ' 

4- 6 

474 

74- 1 

32 

4- 7 

224 

75- 

16 

1 -f- 8 

; 128 

7(V- 

5 

1 4- 9 

45 

77- 

2 

^ 4 10 

20 

Totals 

8585 

! _ i 

1 1 

4- 8763 


Eocample 2,2 


For a distribiition specified by a matliematical function, the determination of the 
mean is a matter of evaluating the intt'gral (2.1), when it exists. For instance, to find 
the mean of the distribution 


we have 


dF == J - ,(1 -- a)"”* a;"-' dx 0 < 1 

1 r' 

u, = I (1 — x)" ^ dx 

B{^,q) r(v +q + \)' r{j>)r(q) 


q 

p + i 




32 


MEASURES OF LOCATION 


2.5. Apart from its relative simplicity and ease of calculation, qualities which ensure 
it a firm place in the elementary theory of statistics, the arithmetic mean has a number 
of properties which make it equally important in advanced theoiy. For instance:— 

(а) If in (2.1) we take a equal to itself the mean vaxushes and consequently the 
sum over the population of deviations from the arithmetic mean is zero. 

(б) The mean of a sum is the sum of the means ; i.e. if /„ are the frequency 

functions of n distributions with means v\ . . . p '^, and if the sum of the frequency 
functions is g with mean Q‘, then 

poo 

0' = 1 (ir — a) g{x) dx 
J -^ca 

•= f (» -«){/x(^) +/.(»•) + . . . +f^{x)}dx 

J —00 

poo poO 

= (•»: — a)/i(*) dx +\ (x — a)Jt{x) da; + . . . + {x — a)fjx) dx 

J - 00 J —00 J —00 

— //'i + + . . . + 

(c) We shall see later that mean values are important in the theory of sampling, 
mainly in virtue of their mathematical tractability, but also because in a certain sense 
the mean is the best measure of location of some distributions. 


The Oeomeiric Mean and the Harmonic Mean 


2.6. Two other types of mean are in use in elementary statistics, though they are 
not of importance in advanced theory. 

The geometric mean of N variate-values is the Nth root of their product and is not used 
if any of the variate-values are negative. For i)roportional frequencies f(x) we have 

n (x/i) 

/«-00 
QD 

or log (? == E fj log Xj 

i~-m 

and for actual frequencies g{x), totalling N, 

Q — n{xfi)N 1 



. (2.4) 

logG =^2y;logx,J 

The harmonic mean of N variate-values is the reciprocal of the arithmetic moan of 
their reciprocals. In the usual notation 


1 ^ r ^ ^ f“ /(x)_dx 

H J_«. X J_o„ X 

or, for actual frequencies, 

1 1 p g(x) dx 

X 

Example 2.3 

To find the geometric and harmonic means of the distribution 

df’= _/*—(1 -a;)P-‘x®-idx 0 < x < 1 
B(p, qy 

log 0 = ® 


. (2.5) 

. ( 2 . 6 ) 


we have 



THE GEOMETRIC MEAN AND HARMONIC MEAN 33 

Now, since by definition 

‘ j - a:)"-’ dx = B{p, q) 

we have, differentiating both sides with respect to an operation which is legiti¬ 
mate in virtue of the uniform convergence of the integral and the existence of the resulting 
expressions, 

fi a 

Jo(l— »-)"“* logxdx — S'). 

IhM '•>8 —,//<»>.«) 

- 5 log 

r(p I- ?) 

= r{<i) - log r{p + g)}. 

The harmonic mean is given by 

a q)j 0 

^ /ib, 9 - 1) ^ j^(9 -1) r(p + o) 

J^ip>g) i\p-r<i-~iy r(q) ' 

^p + g -_L 

q - 1 

so that H -=- ^ . 

p -y q - 1 

We may note that the arilhinetic mean, ^ , is greater than the harmonic mean, for 

9 =. 1 _ P - JL*_L = 1 _ P _ 

p -i-q p + q’ p q -- 1 p + q — 1 

and tiiereforo H 

if „ 

^) + g — 1 p q 

which is clearly so. 


P _ 

p + q - 1 


2.7. Tn general it may be shown that for distributions in which tlie variate-values 
are not negative 

II //i.(2.7) 

Consider in fact the quantity 

m = ^(4 + 4 + • ■ • 4) p 

where the x\ are positive numbers. We shall show that this is an increasing function 
of t, i.e. E{ti) > E(tt) if tv > t%- As a trivial case these inequalities may be replaced by 
equahties, namely if all the a^’s are equal. Wo have 

iiog«-r'iog(j|,2,v)r 

A.S.—VOL. I. ^ 



34 


MEASURES OF LOCATION 


Hence, for the function 

F == log E 
at 

we have 

= t -~ log 
dt^ ^ 

= - {^^(x'loga:)}*] . . . (2.8) 

Now in virtue of Schwarz’s inequality Z’(a*)2’(6®) > {Z{ab))^ the expression in brackets is 
dF 

not negative. Hence has the sign of t and F thus has a minimum at ^ = 0. But 

when f = 0 , jF «= 0 and thus F must be non-negative. Therefore log E is non-negative, 
dE 

and since E is positive is non-negative and thus E is a non-decreasing function. 

Now in E{t)f when ^ = 1 w^o have the arithmetic moan; when < ~ — 1 we have the 
harmonic mean; and when / 0 we have the geometric mean, for 

iog_^r(x‘) 

lim log E = lim---* 

/-~>o ^ 

t= lim Icig ® 

= iriog X. 

Hence the inequality (2.7) follows. 

For simplicity we have stated these results for the discontinuous variate. The analj^sis, 
however, is easily seen to remain true for Rtieltjes integrals and hence is generally valid. 

Hereafter when the ‘‘ mean ” is mentioned without qualification, the arithmetic mean 
is to be understood. 


The Median 

2,8. The median value is that value of the variate which divides the total frequency 
into two equal halves, i.e. is the value such that 

Caw r°® 

/(x) (lx = 1 /(x) dx = \ .(2.9) 

J — ao J lUn 

There is some small indeterminacy in this definition when the distribution is discontinuous 
which may be removed by convention. If there are (2i\^ + 1) members of the population, 
we take the median to be the value of the (JV + l)th member. If there are 2N we take it 
to be halfway between the values of the iVth and the {N + l)th. When the distribution 
is numerically specified in class-intervals there is the usual indeterminacy due to grouping, 
which may be dealt wdth in the manner of the following example. 



MEDIAN AND MODE 


35 


Example, 2.i 

To find the median value of the distribution of heights considered in Example 2.1. 
Half the total frequency of 8585 observations is 4292*5. 

There are, up to and including the interval, inches 3589 

leaving 703*5 

The frequency in the next interval is 1329 

Hence we take the median to be 

703.5 

= 67*47 inches. 

The mean (Example 2.1) is 67 46 inches, practically the same. 

A graphical method of determining the median is given latter in this chapter (2.13). 

The Mode 

2.9. The mode or modal value is that value of the variate exhibited by the greatest 
number of members of the distribution. Tf the frequency function is continuous and 
differentiable it is the solution of 

d ^ ... ( 2 . 10 ) 

If f{x) vanishes and f'{x) is greater than zero we have a miniruum, and such a point is 
sometimes called an Antimodo. 

In numerically specified distributions and disconlimious distributions generally the 
mode is sometimes difficult to determine exactly. It is essentially a concept related to 
the continuous frequency function. For exam})le, if the distribution merely consists of 
an isolated number of values, each of which occurs only once, there is no mode in the 
sense defined above. Where, however, the number is large enough to permit grouping, 
there will usually be an interval containing a maximum frequency, and we may regard 
the mo<le as lying in that interval. Moie generally there may be several maxima, in 
which case the distribution is multimodal. In the height distribution of Table 1.7, for 
instanc’.e. the mode may be considered as lying somewhere in the interval 67- inches. 
To estimate its position more accurately it is necessary to fit a continuous curve to the 
distribution and determine the mode of the curve. The process of fitting will be considered 
in Chapter 6. 

2.10. In a symmetrical distribution the mean, the median and the mode (or in 
cases such as the U-shaped distribution, the antimode) coincide. For skew distributions 
they differ. There is an interesting emjiirical relationship between the three quantities 
which appears to hold for unirnodal curves of moderate asymmetry, namely 

Mean — Mode = 3 (Mean — Median). . . . (2.11) 

A mathematical explanation of this relationshi]) has been given by Doodson (1917). 

It is a useful mnemonic to observe that the mean, median and mode occur in the same 
order (or the reverse order) as in the dictionary ; and that the median is nearer to the mean 
than to the mode, just as the corresponding words are nearer together in the dictionary. 

In elementary theory the median and the mode have considerable claims to use as 
measures of location. They are readily intorpretable in terms of ordinary ideas—the 
median is the middle value and the mode Is the most popular value—and the median 
is itisually more easily determined than the mean in numerically specified distributions. 



36 


MEASURES OF LOCATION 


What gives the arithmetic mean the greater importance in advanced theory i^ its superior 
mathematical tractability and certain sampling properties; but the median has com¬ 
pensating advantages—^it is, for instance, less dependent on the scale and the form of the 
frequency-distribution than the mean—and it seems to deserve more consideration in the 
advanced theory than it has received. 

Quantiles 

2 . 11 . The concept of median value ca^n be easily extended to locate the curve more 
accurately by the use of several parameters. W# may, for example, find the three variate- 
values which divide the total frequency into four equal parts. The middle one of these 
will be the median itself; the other two are called the lower and upper quartiles respectively. 
Similarly, we may find the nine variate-values which divide the total frequency into ten 
equal parts—the deciles. Generally we may find the (a — 1) vaiiate-values which divide 
the total frequency into n equal parts—^the quantiles. Evidently the knowledge of the 
quantiles for some fairly high w, such as 10, gives a very good idea of the general form 
of the frequency-distribution. Even the quartiles and the median are valuable general 
guides. 

2.12. The determination of the quantiles of a numerically specified distribution 
proceeds as for the median, indeterminacies being resolved by the usual conventions. 
That of the quantiles of a mathematically specified distribution, say the Jth quantile, 
is a matter of solving the equation 

^ dF .(2.12) 

-* 00 

which can be done without difficulty by interpolation when the integral of dF has been 
tabulated. 

Example 2.5 

To find the quartiles of the height distribution considered in 
One-quarter of the total frequency is 8585/4 = 

Up to the interval 05-- there are 

leaving 

In the next interval there are 

770*25 

Thus the lower quartile is 64 ] 5 _|-=» 

The upper quartile will be found to bo 
We have already found (Example 2.4) that the median is 
Denoting the quartiles by Qi and Qa we see that 

^ ~ ^ l-7() inches 

— /V === inches 

so that the median is almost half-way between the quartiles, an indication of the symmetry 
of the distribution. 

The Distribution Curve or Ogive of Gallon 

2.13. The quantiles may also be determined graphically. Suppose we plot x, the 
variate, along a horizontal axis and Zf{x), the cumulated frequency up to and including 


Example 2.1, 
2140*25 

1370 members 
770*25 members 
990 members 

05*71 inches 

69*21 inches 
67*47 inches 




THE DISTRIBUTION CURVE 


37 



Z 4* 6 8 10 II <4 lb IB 20 22 24 20 28 30 

Annual Income (£ooo) 

Fio. 2.1. Distribution Curve of the Data of Tabid 1.2. 


X, along the perpendicular ly-axia. We then get a series of points through which, in general, 
a smooth curve may bo drawn. This curve, as is evident from its definition, is 

y = F{x), 


i.e. the graph of the distribution function. It is sometimes called the graduation curve, 


or (Ialton’s ogive (though 
it is only shaped like an 
ogive in certain cases 
such as that of a uni- 
modal symmetrical 
curve). We shall use 
the expression “ distri¬ 
bution curve.” 

Fig. 2.1 illustrates 
the distribution curve 
for the J-shaped distri¬ 
bution of Table 1.2, and 
Pig. 2.2 that for the 
unimodal symmetrical 
distribution of Table 1.7. 
A freehand curve has 
been drawn in both 
cases. 

Curves of this kind 
can bo used to deter¬ 
mine the quantiles. In 
fact, to find the median, 



we merely have to find Fig. 2.2. Distribution Curve of the Data of Table 1.7. 


the abscissa correspond¬ 
ing to the ordinate N 


(Heights shown to correspond to entries in the Table, e.g. cumulated 
frequency at 64 inches is the frequency up to and including the range 
64~ and therefore up to 64 jf inches.) 




38 


MEASURES OF DISPERSION 


and so on. The positions of the quartiles and the median are shown in Fig. 2.2, and 
the reader may care to compare the values obtained by reading the graph by eye with 
those given in Example 2.5. 

Mewmres of Dispersion 

2.14. We now proceed to consider the quantities which have been proposed to 
measure the dispersion of a distribution. They fall into thiee groups :— 

(а) Measures of the distance (in terms of the variate) between certain representative 
values, such as the range, the interdecUe range or the interquartile range. 

(б) Measures compiled from the deviations of every member of the population from 
some central value, such as the mean deviation from the mean, the mean deviation from 
the median, and the standard deviation. 

(c) Measures compiled from the deviations of all the members of the population among 
themselves, such as the mean difference. 

In advanced theory the outstandingly important measure is the standard deviation; 
but they all require some mention. 


Range and Inter quantile Differences 

2.15. The range of a distribution is the difference of the greatest and least variate- 
values borne by its members. As a descriptive parameter of a population it has very little 
use. A knowledge of the whereabouts of the end values obviously tells little about the 
way th6 bulk of the distribution is condensed inside the range ; and for distributions of 
infinite range it is obviously wholly inappropriate. 

More useful rough-and-ready measures may be obtained from the quantiles, and there 
are two such in general use. The interquartile range is the distance between the upper 
and lower quartiles, and is thus a range which contains one-half the total frequency. 
The interdeoile range (or perhaps, more accurately the J~-9th interdecile range) is the distance 
between the first and the ninth decile. Both these measures evidently give some approxi¬ 
mate idea of the spread ’’ of a distribution, and are easil}^ calculable. For this reason 
they are fairly generally used in elementary descriptive statistics. In advanced theory 
they suffer from the disadvantage of being difficult to handle mathematically in the 
theory of sampling. 


Mean Deviatims 


2.16. The amount of scatter in a population is evidently measured to some extent 
by the totality of deviations from the mean. We have seen (2.5) that the sum of these 
deviations taken with appropriate sign is zero. We may however write 


<5x 



X- p\\ dF 


(2.13) 


where the deviations are now taken absolutely, and define di to be a coefficient of dispersion. 
We shall call it the mean deviation about the mean. 

Similarly for the median we may write 

\x^fi,^\dF .(2.14) 

J ~-oo 


and call the mean deviation about the median. 

In future the words mean deviation ’’ alone will be taken to refer to the mean 
deviation about the mean. 



STANDARD DEVIATION 


39 


Both these measures have merits in elementary work, being fairly easily calculable. 
Once again, however, they are practically excluded from advanced work by their intracta¬ 
bility in the theory of sampling. 

Standard Deviation 

2.17. We have seen that the mean about an arbitrary point a is given by 

= f ~ 

J —QO 

We may, by analogy with the terminology of Statics, call this the first moment, and define 
the second moment by 

{x^a)^dF .(2.15) 

J —00 

The second moment about the mean is written without the prime, thus : 

^2 I (x — fi\ydF ..... (2.16) 

J —00 

and is called the Variance. The positive square root of the variance is called the standard 
deviation, and usually denoted by <7, so that we have 

or = -j- fi2 ...... (2.17) 

The variance is thus the mean of the sqvares of deviations from the mean. The device 
of squaring and then taking the square root of the resultant sum in order to obtain the 
standard deviation may appear a little artificial, but it makes the mathematics of the 
sam])ling theory very much 8imi)]er than is the case, for example, with the mean deviation. 

The calculation of the variance and the standard deviation proceeds b}^ an easy 
extension of the methods used for the mean. In particular, if b is some arbitrary value 

(about a) = 1 {x — a)^dF 

J “-Q0 

= [ {{x- bY -f 2(6 - a){x - h) + (6 - aY]dF 

— (about 6) 2(6 — a)fi\ (about 6) -f (6 — «)* . (2.18) 

If now 6 is the mean we have 

/<2 = Hi + {Hi — «)* 

or //jj = /4 — Oh — aY .(2.19) 

Thus the variance can easily be found from the second moment about an arbitrary point, 
which can be selected to simplify the calculations. 


Example 2.6 

To find the mean deviation and the standard deviation for the distribution of men 
according to height considered in Example 2.1 (Table 1.7). 

In the case of the mean deviation jfor a grouped distribution, the sum of deviations 
should first be calculated from the centre of the class-interval in which the mean lies and 
then reduced to the mean as origin. It so happens that in Table 2.1 the mean fell in the 
interval taken as origin, so that the preliminary arithmetic already exists in the Table. 

The sum of positive deviations is 8763 and that of negative deviations — 8584. 
Hence the sum of deviations regardless of sign is 17,347, the unit being the class-interval 
and the origin the centre of the interval. 



40 


MEASURES OF DISPERSION 


To reduce to the mean as origin, wo note that if the number of observations below 
the mean is Ni and the number above the mean is N^, and d /i\ ~ a, we have to add JV,d 
to the sum of deviations about the centre of the interval and subtract N^. In this case 
d = 0-02 (Example 2.1), = 4918, N, = 3667. Hence we add (4918 - 3667)0-02 = 25. 

Hence the mean deviation 




17,347 + 25 
8585 


= 2-02 inches. 


For the standard deviation some further calculation is required, as shown in Table 2.2. 


TABLE 2.2 

Cahulation of the Standard Deviation for the Distribution, of Table 1.7. 
(Some preliminary calculation already carried out in Table 2.1.) 


(1) 

Height, 

inches. 

(2) 

Frequency 

/. 

(3) 

Deviation 

f- 

(4) 

iV- 

57- 

2 

- 10 

200 

58- 

4 

- 9 

324 

5«- 

14 

- 8 

806 

60- 

41 

- 7 

2,009 

61- 

83 

~ 6 

2.988 

6^ 

169 

— 5 

4,225 

63- 

394 

- 4 

6,304 

64- 

669 

- 3 

6,021 

6.5- 

990 

— 2 

3,960 

6*;- 

1223 

~ J 

1,223 

67- 

1329 

0 1 

0 

68- 1 

1230 I 

1 

1,230 

69- 

1063 

2 

4,252 

70- 

646 

3 

5,814 

71- 

392 

4 1 

6,272 

72- 

202 

5 1 

5,050 

73- 

79 

6 

2,844 

74- 

32 

7 

1,668 

75- 

16 

B i 

1,024 

76- 

5 

9 

406 

77- 

2 

10 , 

i 

200 

I 

Totai^ j 

R585 


56,809 


Column (4) shows the sum E^^f, where / is the actual frequency. We then have, for 
the second moment about the arbitrary origin 




56,809 

"8585“ 


6-6172. 


We have already found in Example 

- a = 


2.1 that 

no _ 
ms ~ 


0-0209. 



SHEPPARD’S CORRECTIONS 


41 


Hence, in virtue of (2.19) 

, = 6-6172 - (0-0209)* 

= 6-6168 

<r = -v//^i — 2-67 inches. 

It may be noted that the mean deviation is about 80 per cent, of the standard deviation. 
This relationship often -holds approximately for unimodal curves approaching symmetry. 
The reason will become apparent when we study the so-called “ normal ” distribution in 
Chapter 5. 

Example 2.7 

To find the variance of the distribution 

dF — - -i-—^(1 — dx, 0 < a: < 1. 

Ji(P. 2 ) 

We have, about the origin, 

^ ^(p, q -{-2) ^ _ {q -I- \)q _ 

-6(p, q) (P + 2 4- 1)(P + q)' 

We have already found (Example 2.2) that 

fl\ z= 

P + 9. 

Thus /I 2 /^2 

^ _(g + ^)9 _ 

(P + g + 1)(P + g) (p + g)* 

^ __. 

{p + q -f- l)(p + r/)2’ 


Sheppard\^ Corrections 

2.18. The treatment of the values of a grouped frequency-distribution as if they 
were concentrated at the mid-points of intervals is an approximation, and in certain 
circumstances it is possible to make corrections for any distortion introduced thereby. 
These so-called Shej^pard’s corrections ” will be discussed at length in the next chapter, 
but at this stage we may indicate without proof the appropriate correcjtion for the second 
moment. 

If the distribution is continuous and has high order contact with the variate-axis 
at its extremities, i.e. if it tails off ” slowly, the crude second moment calculate,d from 
grouped frequencies should be corrected by subtracting from it h^/l2, where h is the width 
of the interval. For example, in the height data of Example 2.6, we have A = 1, and the 
corrected second moment is 

6-6168 - 0-0833 = 6-5335. 

The corrected value of a is \/6*5335 == 2-56, as against an uncorrected value of 2-57. 



42 


MEASURES OF DISPERSION 


Mean Difference 

2.19. The coefficient of mean difference (not to be confused with mean deviation) 
is defined by 

■^1 = f f ~y\ dD{x) dF{y) 

J —00 J —00 

\x —y\f{x)f(y)dxdy .... ( 2 . 20 ) 

In the discontinuous case two different formulae arise. We have either 

1 ^ 

H ^ . .( 2 . 21 ) 

' ' y«. —00 A:=»~oo 

the mean difference without repetition, or 

X* X* I 1• • • (^-22) 

00 A:«=—cio 

the mean difference with repetition. The difference lies only in the divisor and is 
unimportant if iV is large. 

The mean difference is the average of the differences of all the possible pairs of variate- 
values, taken regardless of sign. In the coefficient with repetition each value is taken 
with itself, adding of course nothing to the sum of deviations, but resulting in the total 
number of pairs being In the coefficient without repetition only different values are 
taken, so that the number of pairs is N{N — 1). Hence the divisors in (2.21) and (2.22). 



2-20. The mean difference, which is due to Gini (1912), has a certain theoretical 
attraction, being dependent on the spread of the variate-values among themselves and 
not on the deviations from some central value. It is, however, more difficult to compute 
than the standard deviation, and the appearance of the absolute values in the defining 
equations indicates, as for the mean deviation, the appearance of difficulties in the theory 
of sampling. It might be thought that this inconvenience could be overcome bj' the 
definition of a coefficient 

= r r {x - ?/)" dF(:x) dF(y), 

J —ooj —00 

This, however, is nothing but twice the variance. 


For 



dF(x) dF{y) 


(a;® — 2xy + y®} 



' dF(x) f dF{y) - 2 [ x dF(x) [ y dF{y) 

J — J —oo J —00 

+ f dF(x){ y^dF{y) 

J —00 J —00 




2 


— 2/^2 • 


. (2.23) 


This interesting relation shows that the variance may in fact be defined as half the 
mean square of all possible variate differences, that is to say, without reference to deviations 
from a centra] value, the mean. 



CONCENTRATION 


43 


Coefficients of Variation: Standard Measure 

2.21. The foregoing measures of dispersion have all been expressed in terms of units 
of the variate. It is thus difficult to compare dispersions in different populations unless 
the units happen to be identical; and this has led to a se^trch for measures wliich shall 
be independent of the variate scale, that is to say, shall be pure numbers. 

Several coefficients of tins kind may be constructed, such as the —— or 

mean 

— Only two have been used at all extensively in practice, Karl Pearson’s 
coefficient of variation, defined by 


100 ' 




(2.24) 


and Gini’s coefficient of concentration, defined by 

= .(2.25) 

Both these coefficients suffer from the disadvantage of being affected very much by 
the value of the mean measured from some arbitrary origin, and are hardly suitable for 
advanced work. 


2.22. For our purposes, comparability may be attained in a somewhat different way. 
Let us take a itself as a new unit and express the frequency function in terms of a new 
variable ^ related to x by 

f = - —. 6 .(2.26) 

cr 

Any distribution expressed in this way has zero mean and unit variance. It is then said 
to be expressed in standard measure. Two distributions in standard measure can be readily 
compared in regard to form, skewness, and other qualities, though not of course in regard 
to mean and variance. 


Concentration 

2.23. (dni’s coefficient of concentration arises in a natural way from the following 
approach 

Writing, as usual 

F{x) = r f{x) dx .(2.27) 

J —00 

let us define 

0(a:) =f xf(x)dx ..... (2.28) 
MlJ —00 

0(x) exists, of course, only if //J exists. Just as F{x) varies from 0 to 1, 0(x) varies from 
0 to 1 provided that the origin is taken to the left of the start of the frequenoy-distribution, 
which wo shall assume to be so. 0(x) may be called the incomplete first moment. 

Now (2.27) and (2.28) may be regarded as defining a relationship between the 
variables F and in terms of parametric equations in x,^ The curve whose ordinate 

♦ Tlie dcfuiition of curves by paraniotric equations will found treated in most textbooks of 
differential calculus. The term “ parameter ” in this connection is usual in mathematics, but is not 
to bo confused with the more special statistical parameter as defined in 2.2. 



44 MEASURES OF DISPERSION 

and abscissa are (P and F is called the curve of concentration. Such a curve is shown 
in Fig. 2.3. 



Fio. 2.3.—Curve of Concentration. 


== 


positive. 


The curve of concentration must bo convex to the F-axis, for we have 

, d0 _ xf{x) 

which is positive since our origin is taken to the left of the start of the distribution. 

d'F ~ f ix) 

Thus the tangent to the curve makes a positive acute angle with the F-axis, and the angle 
increases as F increases; in other words, the curve is convex to the F-axis. 

The area between the concentration curve and the line F — 0 is called the area 
of concentration. Wo proceed to show that it is equal to one-half the coefficient of 
concentration. 

In fact, we have from Fig. 2.3 


Fd0 


and thus 


1‘ 


0dF 


2 (area of concentration) = J ^ 

2jj.\ (area) = I F{x)x dF{T) — /i\ 0(x) dF(x) 

J —OD J — C30 

= f X dF{x) [ dF{y) - [ dF(x) [ y dF{y) 

J —00 J-“C«0 J—oo J —cc 


Now 


= f [ (* — 3/) dF(x) dF(y). 

00 /*00 J — 00 J —oo 

I j [X — y) dF{x) dF(y) = 0, and hence 

2/^1 (area) = T 1 « L ~ 

= f \x-y\ dF(x)dF{y) 

J —QO J —OO 

A 




CONCENTRATION 


45 


Thus 


urea of concentration = 


1 ^ 

2 2(i\ 


-0, the coefficient of concentration. 


2.24. Various methods have boon given for calculating the mean difference. The 
following is probably the simplest, particularly for distributions specified in equal group- 
intervals. 

Let us, without loss of generality, take an origin at the start of the distribution. We 
may then write 

N N 

- Xk \ = 21'(Xj - X^) 

the summation E* being taken over values such that j > k. We have also 

*- ^Hi) + ^/- 2 ) + . . • + {J^k+i - ^k)- 

JV~1 

Thus r{x, - x^) = 2^ - x^) 

h^l 

where C|^ is the number of terms of type (x^ — x,.) in Z' containing Since k is 

the number of values of j less than or equal to h {the origin being at the start of the dis¬ 
tribution) and N h the number greater than or equal to A -f 1, we have = h{N — A), 
and thus 

==-^^r{xi - X,) 

4> 

= ~ - ^h) .( 2 . 29 ) 

h^\ 

This form is particularly useful if all the interv^als are equal, Ff^ being the distribution 
function of Xj^ we then have 

/i«i 

N~~\ 

.( 2 . 30 ) 

h-l 

If the actual cumulated frequency for is we have 

o 

-^1 ~ ^ ^h) . . . # . ( 2 , 31 ) 

the most convenient form in practice. 



Example 2,8 

Returning once mgre to the height distribution cotisidered in previous examples, we 
may calculate EOfJiN — as in the Table overleaf. 



MEASURES OP DISPERSION 


46 


TABLE 2.3 

Calculaiion of the Mean Difference for the Height Disttibution of Table 1.7* 


Hejight, 

inches. 

Frequency. 

(?*• 

1 

G,(N - (?.). 

57- 

2 

2 

8583 

17,166 

58- 

4 

6 

8579 

61,474 

59- 

14 

20 

8565 

171,300 

60- 

41 

61 

8624 

519,964 

61- 

83 

144 

8441 

1,215,604 

62- 

169 

313 

8272 

2,689,136 

63- 

394 

707 

7878 

5.569,746 

64- 

669 

1376 

7209 

9,919,584 

65- 

990 

2366 

6219 

14,714,164 

66- 

1223 

3589 

4996 

17,930,644 

67- 

1329 

4918 

3667 

18,034,306 

68- 

1230 

6148 

2437 

14,982,676 

69- 

1063 

7211 

1374 

9,907,914 

76- 

646 

7857 

728 

6,719,896 

71- 

392 

8249 

336 

2,771,664 

72- ! 

202 ! 

8451 

134 

1,132,434 

73- 1 

79 1 

8630 

55 

469,160 

74- ! 

32 

8562 

23 

196,926 

75- 

16 

8578 

7 

60,046 

76- 

5 

8583 

2 

17,166 

77- j 

2 

8585 

1 

'— 

Totals 

8585 



105,990,860 


We have, from (2,31), for the mean difference with repetition, 

. _ 2 X 105,990,850 
' 858 .^ ““" 

=- 2-88 inches 

as against a mean deviation of 2-02 inches and a standard deviation of 2-57 inches (Example 
2.6). There is, of course, notliing inconsistent in the difference between these values. 
The ooefficients are different in nature, and there is no reason why their numerical values 
in j|«iy particular case should approach equality. 


NOTES AND REFERENCES 

The relationship between mean, median and mode expressed in equation (2.11) was 
discussed from the mathematical point of view by Doodson (1917), who showed that it 
holds as a first approximation for continuous distributions deviating only moderately from 
symmetry. 

It was shown by Dunham Jackson (1921) that the indeterminacy in the definition of 
the median can be removed by a more sophisticated mathematical approach. He showed 

that for N yalues Xi * * . the sum I f ~~ considered as a function of f, has 

jri 



EXERCISES 


47 


a mimmum for some unique fp if ^ > 1; and further that as jp —> 1, tends to some 
unique value, which may be defined as the median. 

The proof of the increasing character of the function F{t) of 2.7 is due to Norris (1935), 
who gives references to earlier proofs. 

The work of the Italian school on concentration does not appear to have been treated 
in English books. The fundamental memoir is that of Gini (1912), who has returned to 
the subject in subsequent papers, many of them in Metron. For methods of calculating 
the mean difference, see de Pinetti and Paciello (1930). 

de Finetti, B., and Paciello, XJ. (1930), ‘‘ Sui raetodi proposti i>er il calcolo della diSerenza 
media,’* Metrm, 8, part 3, 89. 

Doodson, A. T, (1917), Relation of Mode, Median and Mean in Frequency Curves,” 
Biometrikay 11, 425. 

Gini, C. (1912), “ Variabilita e Mutabilita,” Stvdi Economico-Oiuridici delUi B. Vniversita 
di Cagliari, Anno 3, part 2, p. 80. 

Gini, C., and Galvani, L. (1929), ‘‘ Di taluni ostensioni dei concetti di media ai caratteri 
qualitativi,” Metrm, 8, parts 1-2, 3. 

Jackson, Dunham (1921), ‘‘Note on the median of a set of numbers,” BulL Amer, Math, 
Soc,, 27, 160. 

Norris, Nilan (1935), “ Inequalities among averages,” Ann. Math. Stats., 6, 27. 


EXERCISES 


2.1. Show that the mean deviation about an arbitrary point is least when that point 
is the median. 


2.2. Show that the mean (about the origin) of the discontinuous distribution whose 
frequencies at 0, 1, 2, . . . r, . . . are 


tn tn 
’ iV 


2 

‘ 


. . 




r\ 


I’ 


is m, and that the variance is also m. 


2.3. Show tiiat, if deviations are small compared with the value of the mean, we 
have apx)roximately, for the (geometric and Harmonic means. 



and hence that 

jn\ - 20 + H =- 0. 

2.4. Show that the mean deviation about the mean is not greater than the standard 
deviation. 


2.5. Show that for the “rectangular” population 

dF dx, 0 < ir < 1 

/v'l (about the origin) = J 

//j = *5^ 

mean deviation = i 



48 


MEASURES OF LOCATION AND DISPERSION 


2.6. Show that for the distribution 

dF yo ^ ^ 0 < ar < 00 

the mean, standard deviation and mean difference are all equal to a ; and that the inter¬ 
quartile range is o log^ 3. 

2.7. Show that for the distribution 

dF ~ yo« 0 < ;^ < 00 

fi\ (about the origin) = 

/4 == »’• 


2.8. Show that if a range of six times the standard deviation contains at least 
18 class-intervals, Sheppard’s correction will make a difference of less than 0-5 per cent, 
in the uncorrected value of the standard deviation. 

2.9. Show that for a continuous distribution 

idi = 2f F{x)]dx, 

J —00 

2.10. If the variate-values of a distribution are x\ . . . in ascending order of 
magnitude and 

«r = 

then AI 


V - Z‘ 




r-l 

N 


^ — y'jr 

r-i 


/-I 




-|. 1)m[ - 2U}. 





CHAPTER 3 

MOMENTS AND CUMULANTS 


Definitim of Moments 

3.1. In the previous chapter we defined the first moment (arithmetic mean) about 
an arbitrary point a by the Stieltjes integral 

n\ — [ (x — a)dF .(3.1) 


and the second moment about the point by 

//g — J (x — aydF .(3.2) 

In generalisation of these equations we may define a series of coefficients /i', r - 1, 2 . . ., 
by the relation 

f oo 

{x — aydF. .(3.3) 


fij, is called the moment of order r about the point a. When a is the mean fxi we write 
the moment without the prime, 

~ j* hiX • * • . • • (3.4) 

In particular 


fix = 0 , 

and we may also define a moment of zero order 



It is assumed that when reference is made to the rth moment of a particular distribution, 
the appropriate integral (3.3) converges for that distribution. As will be seen later, some 
of the theoretical distributions encountered in statistics do not possess moments of all 
orders ; some, in fact, possess only a few moments of low order, and one or two do not 
possess any, except of course the moment of order zero. 


3,2. If a and b are two variate-values, let 6 — a = c and denote the moments about 
a and b by fji\a) and ft{b) respectively. Then we have, by the binomial theorem, 

{x ay ^ {x — b + b -- ay ^ {x — b + cy 


Hence 


(x — 6)*'”^ c\ 


so 

(x-aydF 

J —<J0 

-tCM. 


(X - by-i cl dF 


(X - 6)’--' dF 


A.3.—VOL. I. 


49 


(3.5) 



60 


MOMENTS AND CUMULANTS 


<;v 


This equation gives the rth moment about a in terms of the rth and lower moments about 
6. It may be written in a symbolic form which will be found to provide a useful mnemonio, 
namely 

fi,{a) = {f^{b) +cY 

with the convention that the expression on the right is to be expanded binomially and 
the form replaced by jUj(b), 

The equation (3.6) is of particular importance if one of the values a or 6 is the mean 
of the distribution. In tliis case we have 



== . 

. (3.6) 


11 

1 

T 

. (3.7) 

In particular 




ff'i = /M» + ^ 1 

"f" 4 • ' • 

Mi — Mt + ^MiM» + + M*) 

. (3.8) 

and 

Ml = Mi - , 1 

M» = M-i - ^MiMi + Vj® , 4 • 

Mi ^ Mi — Vi/4 + - ^Mi) 

. (3.9) 


Calculation of Moments 

3.3. For a distribution specified numerically in a frequency tabh' the calculation 
of moments of third and higher orders is akin to that of the first and second moments. 
For grouped data (high order moments are hardly ^ver required for ungrouped data) the 
observations are regarded as concentrated at the mid-points of intervals ; a convenient 
arbitrary origin a is chosen, the moments about a calculated, and tlien if necessary the 
moments about the mean are ascertained from (3.6) or (3.7). The effect of grouping may 
be corrected for in certain cases. 

In practice numerical moments of order higher than the fourth are rarely required, 
being so sensitive to sampling fluctuations that values computed from moderate numbers 
of observations are subject to a large margin of error. 

There are two methods in general use for arriving at the moments about an arbitrary 
origin. The first is an immediate generalisation of the methods used in Chapter 2 for the 
first two moments. The second will be considered in 3.10 in connection with factorial 
moments. 

Example 3.1 

To find the first four moments about the mean of the distribution of Australian 
marriages of Table 1.8. 

Until the last stage wo work in units of three years, the variate interval. A working 
mean is taken at 28 6 years. To check the arithmetic we use an identity of type 

{X + 1)3 == jrS 4 . 3^1 4 . 3 ^ 4 1 

(a; + 1)^ == + 4a:* + 6a:* + 4a: + 1. 



CALCULATION OF MOMENTS 


51 


Thus, for instance, the value of g(x){x + 1)'’ is found in addition to that of g{x)x'^ and the 
two checked by identities such as 

i:g{x){x + 1)» = Eg{x)x^ + ZEg{x)x^ + 'iEg(x)x + Eg{x), 
g{x) being the actual frequencies. Tlie arithmetic work is shown in Table 3.1. 


TABLE 3.1 

Calculation of the. First Four Moments of the Distribution of Marriages of Table 1.8. 


Mid- 

vftlue 

of 

Inter- 


ir. 

xg. 

(xJ, 1)?. 

x^g. 

(x+i)Y 

x^g. 

(■c+DV- 

X*g. 

(*■+ i)V- 

vals, 

Years. 

- 










10-5 

294 

-4 

- 1,170 

- 882 

4,704 

2,646 

- 18,816 

- 7,938 

75,264 

23,814 

19-5 

10,996 

~3 

~~ 32,985 

-21,990 

98.955 

43,980 

-296,865 

- 87,960 

890,595 

376.920 

22*5 

61,001 

-~2 

--122,002 

-61.001 

244,004 

61,001 

-488,008 

- 61,001 

976,016 

61.001 

265 

73,054 

-1 

- 73,054 

-83,873 

73,064 

— 

- 73,054 

-156,899 

73,064 

— 

28-5 

56,601 

0 

— 229,217 

56,501 

— 

56,501 

-876,743 

66,501 


56,501 

ni-6 

33,478 

1 

33,478 

60,956 

33,478 

133,912 

33,478 

267.824 

33,478 

535,648 

34-5 

20,569 

2 

41,138 

61,707 

82,270 

185,121 

164,552 

555,363 

329,104 

1,666,089 

37-5 

14,281 

3 

42,843 

57,124 

128,529 

228,496 

385,587 

»13,1*84 

1,156,761 

3,656,936 

40-6 

9,320 

4 

37,280 

46,600 

149,120 

233,000 

596,480 

1,166,000 

2,385,920 

5,825,000 

43*5 

6,236 

6 

31,1801 

37,416 

155,900 1 

224,496 

779,600 

1,.346,976 

3,897.500 

8,081,856 

40-6 

4,770 

6 

28,620 

33,390 

I 171,720 1 

233,730 

1,030.320 

1,636,110 i 

6,181,920 

11,452,770 

49-5 

3,620 

! 7 

25,340 

28,960 

! 177,380 1 

231,680 

1,241,660 

1,863,440 i 

8,691,620 

14,827,520 

52-6 

2,190 

8 

17,5201 

39,730 

140,160 I 

177,390 

1,121,280 

1,696,510 1 

8,970,240 

14,368,590 

55-6 

1,655 

9 

34,895 

' 16,550 

! 134,056 

165,500 

1,206,495 

! 1,665,000 

10,858,455 

16,550,000 

58*6 

1,100 

10 

11,000 

1 12,100 

! 110,000 

133,100 

1,100,000 

1 1,464,100 I 

11,000,000 

16,105,100 

61-5 

810 

11 

8,910 

9,720 

98,010 

116,640 1 

1 1,078,110 

1,399,680 

11,869,210 

16,796,160 

(»4*6 

649 

i 

7,788 

8,437 

93,456 

109,681 1 

1,121,472 

1,425,853 

13,457,664 

18,636,089 

t)7 5 

1 487 

13 

6,331 

6,83 8 

82,303 j 

95,452 1 

1,069,939 

1,336,328 

13,909,207 

18,708,692 

70*5 

1 320 

14 

4,564 

4,890 

63,896 

73,350 

894,544 

1,100,260 

864,266 

12,623,616 i 

16,503,760 

73-6 

211 

15 

3,165 

3,376 1 

47,476 

54,010 j 

712,125 

10,681,875 1 

13,828,096 

70'5 

i 119 

1 

} 1,904 

1 2,023 

[ 30,464 

34,391 ! 

487,424 

1 584,647 

7,798,784 

9,938,999 

79-r» 

I 73 

i n 

1 1,241 

1,314 

21,097 

23,652 

368,649 

426,730 

0,097,033 

7,663,248 

82*5 

27 

18 

486 

5131 

8,748 

9,747 

157,464 

i 185,193 

2,834,352 

3,618,667 

85-6 

14 

19 

266 

280 

5,(L54 

5,600 

96,026 

112,000 

1,824,494 

2,240,000 

88*5 

6 

i 20 

100 

305' 

: j 

2,000 

2,205 

40,000 

46,305 

800,000 

972,406 

Totals 
or + V* 

301,785 

_ 

318,049 

‘^74,490 

2,155,838 

12,635,287 

13,675,105 

i9,991,0.’>6 

137,306,162 

202,091,761 

TERMS 

1 


1 1 



1 1 

1 





From this table we find 

E(.r(j) = 88,832 

E{x^g) = 2,15.5,838 

E(x^g) -= 12,798,362 
y{x*g) -= 137,306,162. 

The values will be found to check and we have, about the working mean, on dividing by 
the total frequency 301,785, 

= 0-294,366,253 

ff.^ = 7-143,622,115 

=■ 42-408,873,867 
//i == 454-980,075,219. 




52 


MOMENTS AND CUMULANTS 


For the moments about the mean, substitution in equations (3.9) gives 

= 7 * 056,977 

/ia =« 36 * 151 , 59 e 5 
^4 = 408 - 738 , 210 . 

These are expressed in class-intervals, which are units of three years. To express the 
results in units of one year we multiply the rth moment by e.g. 

= 7 * 056,977 X 9 = 03 * 512 , 79 . 

3.4. If a distribution is specified mathematically the determination of moments is 
equivalent to the evaluation of c?ertain sums or integrals. It is usually necessary to con¬ 
sider whether the moments exist. Some examples will illustrate the general principles 
involved. 


Example, 3,2 

Consider the so-called binomial distribution (q + p)^ which the frequencies of 
values 0 , h, 2 A, . . . are the successive terms in the expansion of the distribution, i.e. are 

q^, ^ 2 ^ 7 ” “"jP* • • • Taking an origin at the first term and working in units 


of h, we have 


which may be written 




Hence 

Similarly 


+ pr 

= np(q + p)”~^ 

= np, 

= ?ip(q + pr^^ + n(7i — l)p^(q + p^^^ 
= n'^p^ + 7ipq, 

Pi = npq. 


<“ - ( 4 )’ 


{q + pT 


etc., and it will be found that 

Pi = npq{q - p) 

P* = 3f>*g*w* + pqn{l — &pq). 


Exampk 3.3 

Consider the distribution 


dF = 


k 

(1 + a:*)™ 


dx 


— 00 < a: < 00 

»w > 1. 



CALCULATION OF MOMENTS 


53 


This is a unimodal distribution symmetrical about a: = 0. All existent moments of odd 
order about the origin therefore vanish. The constant k is given by the equation 

1=/.r _ 

j-«(i 

= -1) 

nm) 


The moment about the mean of order 2r, if it exists, is given by 

(i + 

and this integral converges if and only if 

2m > 2r -f 1. 

Thus the moments about the origin of order < (2m — 1) exist and those of higher order 
do not. 

lex (lx 

If m = 1 it may be noted that the integral 1 ~ ~ * - is not completely convergent, 

J — 00 1 “i* 

rti' 

i.e. lim I — 2 \ exist, although the principal value 

lim r 

does exist and is equal to zero. It is a matter of convention whether we regard the dis¬ 
tribution as possessing a mean in this case. For m > 1 the mean exists and is located at 
the origin. 

Making the substitution z — formula for wo find 

1 **j” X 


/^2i 


, r=ifcp(i 

Jo 

r(r + |)r(m 
r{m) ~ 


and on substituting for h, 


f. _ / (^ 4 \)1 ^ if 2m >► -4 1 

~' nm'X' - i) “ ^ 


ExampU 3.4 

Consider the “ normal ” distribution 

1 _ 

dF = —e (lx — 00 < X < 00 . 

orV2rr 

Tliis is symmetrical about the origin. All moments exist, those of odd order vanishing. 
Thus 

This may be evaluated by partial integration, but a more direct method is as follows: 
Consider the integral 

M{t) = —>y„-Tf dx 



64 


MOMENTS AND CUMULANTS 


We have, for aU real values of t 

, _2l -£.‘\ 

e^e 4<»* = > { p *’■ ® *“■). 

The series on the right is uniformly convergent in x and may be integrated term by term 
if the resulting series is uniformly convergent. We then have 

fr 

In other words, is the coeflScient of ^ in and hence 

ia^r^2r)\ 

2r y,| * 


Moment-generating Functions and CharacUristic Functions 

3.5. The previous example shows that in some eases we can derive from the dis¬ 
tribution or the frequency function a function M{t) which, when expanded in powers of 
t, will 3 deld the moments of the distribution as the coefficients of those powers. Such 
a function is accordingly called a Moment-generating Function. It will be discussed more 
fully in the next chapter. 

For many frequency functions the integral 1 e^'^ dF or the sum l!{e^^}f{x^)] may 

not exist for real values of i. This is, for example, true of the function dF = i‘(l -f dx 
for finite positive values of rn, A more serviceable auxiliary function is 

(p(l) = [ e'^dF {t real).(3.10) 


This is known as the Characteristic Function and is of great theoretical importance. It 
will be seen in Chapter 4 that under certain general conditions the characteristic function 
determines and is completely determined by the distribution function. It also yields 
many valuable results in the theory of sampling. 

Since by the nature of the distribution function the integral I dF converges, 

J —00 

pcc 

1 (pit) I < I I < 1 
J —oo 

and hence the Stieltjes integral (3.10) converges absolutely and uniformly in t. It may 
therefore be integrated under the Ksummation signs with respect to ^ and may be differ¬ 
entiated provided that the resulting expressions exist and are uniformly convergent. We 

have, for example, writing for 

J —00 

and hence, putting ^ = 0, 

.(3.11) 

provided that //,. exists. If (f(t) be expanded in powers of t, must thus be equal to the 

(ity 

coefficient of p in the expansion. Thus the characteristic function is also a moment¬ 
generating function. 



MOMENT-GENERATING FUNCTIONS AND CHARACTERISTIC FUNCTIONS 66 


Examph 3.5 

Confiider again the binomial {q + p)”'. Taking h as unit, we have 


Hence 


and so on. 


q>{t) = 2 ^ 


,-o ''' 

iq + pe^Y- 


= np 

= (- + pe<0"j^ 

e= up + 


Example 3.6 

Consider the distribution 


dF = a:v-i e-"^ dx ® X ^ ^ 
r(y) 0 '4 ^ 00 


which is known as Pearson’s Type III (of. Chapter 6). The distribution may have a variety 
of shapes, depending on the value of y, but moments of all orders exist in virtue of the 

convergence of the integral 1 dx, the jT-function integral. We have then, for the 

Jo 

characteristic function, 


av 

(p{t) = T?r”cl e^^”**”***^ dx, 
i (y)Jo 


By the substitution z = x{a — it) this becomes 


__ •y 

(p(t) = - - - -.— I z’’-' dz 

r(y)(a-i<rJo 


(‘-!F 


since 1 dz = r{y) whether 2 is real or complex. 

Jo 


Hence 




riy + 1) 


/ _ y{y + i)(>' + 2) 


and thus 



66 


MOMENTS AND OUMULANTS 


and so on. In particular, 




Absolute Momenta 

3.6. The quantity 

r' — [ \x — a\'' dF .(3.12) 

is called the absolute moment of order r about a. The absolute moments about the mean 
are written without primes. 

If r is even, tlie absolute moment is clearly equal to the ordinary moment, and if the 
range of the distribution is positive the absolute moments about any point to the left of 
the start of the distribution are equal to the ordinary moments of corresponding order. 

There are some interesting inequalities concerning the abvsolute moments. Referring 
to the function E{t) of 2.7 and remembering that it is a non-decreasing function of t, we 
find, on putting if = 1, 2, . . that 

(vi)^ < < {v-^f ... < (iv)r.(3.13) 

A more general inequality, due to Liapounoff, is 

a >b c I 0 .(3.14) 

A proof of this result is sketched in Exercise 3.14. In particular, x>utting h ^ l(a + c) 
we find 

s < vlvl .(3.15) 


Further, putting c = 0 in (3.14) wo find 

or ^ yn'^^ya"" 

whioh is equivalent to (3.13), 


Factorial Moments 

3.7, The factorial expression 

x{x — h){x — 2h) ... (x — r — ill) 

may conveniently be written a notation which brings out an analogy with the power 
x^. Taking first differences with respect to x and with unit A, we have 

= [x + — x^^^ 

^ (x ■+ h)x{x — Aj . , , (x — r •— 2A) — a:(x — A) . . . (a; — r — lA) 

which may bo compared with the differential equation 

dx^ — rx^^^ dx. 


Cbnversely 
corresponding to 




FACTORIAL MOMENTS 


67 


The rth factorial moment about an arbitrary origin may then be defined by the equation 

~ • • ♦ • * (3.16) 

where we have chosen the summation sign E rather than the Stieltjes integral because 
it is almost entirely for discontinuous distributions, or continuous distributions grouped 
in intervals of width A, that the factorial moments are used. In statistical theory they 
are not very prominent, but in the theory of interpolation and of curve fitting they are 
sufficiently important to justify some mention of their properties. 

As usual, when it is necessary to distinguish between factorial moments about the 
mean and those about an arbitrary point we may write the former without the prime. 


3.8. The factorial moments obey laws of transformation similar to those of equation 
(3.5) governing ordinary moments. In fact we have the expansion * 

(a + hp - 

and hence {x — = (r — 6 + where c - b — a 

and hence //[,,(«) = J/V-;] .(3-17) 

which may be written symbolically 

3.9. By direct expansion of (3.16) it is seen that 

M[i] == Ml 
M[i\ ^ Mt - 
Mm =• ^ 

M[i] = Mi — ^^'Ms + 11/^V 2 ““ J 

and conversely that 

M 2 ^ Mm + V[ij ^ 1 

M.\ Mm + *^^Mm Vfii r • • • • (3.19) 

Mi = /^i4] + ^^Mm + 

Since the first moments are equal the equations remain true when the primes are 
dropped and terms in first moments omitted. 


hfi\ 

3/?/4 + 


1 . 


(3.18) 


♦ It is clear tliat (a + will be a polynomial of degree r in a, and may therefore be equated 


r 


obtain 

rUr^-i] 


hj where the k's are polynomials in h and h but do not contain a. Putting a »= 0 we 

6fr] ^ ic^. Taking first difierences with respect to a and putting a «= 0 we obtain 
= Successive differences give the Ar’s and the above result follows. 



58 


MOMENTS AND CUMULANTS 


It is possible to give general formulae showing the factorial moments about one point 
in terms of the ordinary moments about another, and vice-versa. In fact 




• (,3.20) 

. ( 3 . 21 ) 


where is the Bemouilli polynomial of order n and degree r in x, defined as the 

ooeflScient of in ^ j e'*'. For a discussion of these polynomials and the derivation 
of equations (3.20) and (3.21) reference may be made to Frisch (1926). 


Calculation of Factorial Moments 

3.10, The calculation of factorial moments for grouped data may be effected by a 
process of progressive summation which is illustrated in Table 3.2. 


TABLE 3,2 


(1) 

(2) 

(3) 


(*) 

Frequency 

First Summation. 

Secoml Summation. 

Third Summation, | 

fi 

/l-f . . . -+/n 




ft 

+/« 

/, + 2/,+ . . . (n-l)/„ 




f%+ ... 4-/n 

/a ^ 2 / 4 ^- . . . (n —2 )/m 

A +'3A+ - • • 

(n — l)(n-~ 2) 1 

/s 

2 1 

A 

A ^/5+ * • • -^fn 

Ah* ... {h— 3)/rt 

/4 4 3At . 

(«~2)(«-3) ' 

• • 2 

fn-2 

/n-~2 “f/w -1 +/« 

* 

/n~2 f 2/w_] 4 3/n 

A- 2 h~3A 

-i4 6/n 

fn—1 

fn— 1 ~\~fn 

fn—1 4 2/n 

fn- 

i+3/„ 

fn 

fn 

fn 


fn 

Totals 

S{jf}) = ftji 


--'1’ 

-2) \ 1 , 
jl] ~ 3|/^[9J 


Writing the proportional frequencies in the successive n intervals as /i as 

shown in the left-hand column, we construct column 2 by adding frequencies from the 
bottom. In the nth row we write /^, in the (n — l)th row the sum *f /^_i, in the 
(n — 2)th row the sum +/n~i +/n-- 2 j containing the sum 

fn +fn^l + • • • /i* 

In column 3 the process is repeated with the rows of column 2, stopping at the second 
row, e.g. the nth row contains /^, the (n - l)th row +/n-i) + A = + A~i> and 

so on, the second row containing the sum (n — 1)A -f (n — 2)A~i + • • . + 2/* + A* 



CALCULATION OF FACTORIAL MOMENTS 


69 


Column 4 repeats the process with the entries of column 3, but stopping at the third 
row; and so on. 

Consider now the sum of the entries in column 2. In that sum /i appears once, 

n 

/, twice, . . . n times. Hence the sum is equal to 

== 

In column 3, /, appears once, /a 3 times, . ^ . fn 1) times. Hence 





In general, the sum of the (r + l)th column will be given by 

Sum = 

If the actual frequencies are used instead of the proportional frequencies the sums have to 
be divided by the total frequency N, 

Thus the process of summation gives the factorial moments directly. It is a modifi¬ 
cation of one which is due to G. F. Hardy (cf. Elderton, 1938a). The use of the method 
in practice lies in the fact that for certain calculating machines the progressive summation 
is easier to carry out than the processes involved in the method of Example 3.1. 


Example 3.7 

Consider again the data of Table 1.7, showing the distribution of 8585 men according 
to height in inches. The columns on the right in Table 3.3 overleaf show the successive 
sums. At the top of each column there has been placed within brackets the number 
which would have been obtained if the summation were continued up the column one 
place further than is required for the sum at the foot. These bracketed figures are useful 
to have as a check since each must equal the sum at the foot of the preceding column. 

From this table we find 

= 11 020,850,320,33 

= 117*055,096,097,84 

^^33 = 1,194*957,483,983,69 
= 11,702*727,082,119.98 


From these values we may derive the ordinary moments, using equations (3.19), 
and find 


/a; = 11*020,850,320,33 

= 128*075,946,418,2 

//3 == 1,557*143,622,597,5 
^ 19,702*878,509,027,3, 


from which we find, for the moments about the moan, 


^^2 6*616,805 

^3 = - 0*207,840 
137*689,185, 


the units being one inch. 



60 


MOMENTS AND CDMULANTS 


TABLE 3.3 

Calculation of the Fcaiorial Moments of a Distribution of Men according to Beight in Inches 

{Table 1,7). 


Height. 

Froquoncy. 

First Sum. 

Second Sum. 

Third Sum. 

Fourth Sum. 

67- 

2 

8,685 

(94,614) 

__ 


58- 

4 

8,583 

86,029 

(602,459) 

— 

69- 

14 

8,579 

77,446 

416,430 

(1,709.785) 

60- 

41 

8,565 

68,867 

338,984 

1,293,355 

61- 

83 

8,524 

00,302 

270,117 

954,371 

62- 

169 

8,441 

61,778 

209,816 

684,264 

63— 

394 

8,272 

43,337, 

158,037 

474,439 

64- 

669 

7,878 

36,006 

114,700 

310,402 

65- 

990 

7,209 

27,187 

79,636 

201,702 

6fi- 

1223 

6,219 

19,978 

62,448 

122,067 

67- 

1329 

4,996 

13,759 

32,470 

69,619 

68- 

1230 

3,667 i 

8,763 

18,711 

37,419 

69- 

1063 

2,437 

5,096 

9,948 

18,438 

70- 

646 i 

1,374 

2,659 

4,852 

8,490 

71- 

392 i 

728 

1,285 

2,193 

3,638 

72- 

202 i 

336 

557 

908 

1,445 

73- 

79 

134 j 

221 

351 

537 

74- 

32 

65 

87 

130 

186 

76- 

■ 16 

23 

32 

43 

56 

76- 

5 

7 

9 

i 

13 

77- 

I 2 

1 

2 

2 

2 

• 2 

Totals 

8585 

94,614 

602,459 

1,709,785 

4,180,163 


Cumulants 


3.11. The moments are a set of parameters of a distribution whieh are useful for 
measuring its properties and, in certain circumstances, for specifying it. Their use in these 
connections will be considered in later chapto's. They are not, however, the only set of 
parameters for the purpose, or even the best set. Another seiies of parameters, the so- 
called cumulants, have properties which are more useful from the theoretical standpoint. 
Formally, the cumulants /ci, kj, , , . k^. are defined by the identity in t 


exp H- 


Hrf 

r!' 


1 + + 


” 2 ! 


+ ^4 
r! 


• (3.22) 


It is sometimes more convenient to w^rite the same equation with it for t, thus : 

f (it)^ 

exp < + • 


r! ^ 




RELATIONS BETWEEN MOMENTS AND CUMULANTS 


61 


► (itY 

Thus, whereas is the coefficient of in q>{t), the characteristic function, Kr is the coefficient 
0 f iix log if an expansion in power series exists. 

3.12. If in equation (3.23) the origin is changed from a to 6, where as usual b — a = e, 

the effect on y{0is to multiply it by for becomes Hence the 

effect on log (p{t) is merely to add the term — itc, and consequently the coefficients in log 
are unchanged, except the first, which is decreased by c. 

Hence the cumulants are invariant under change of origin, except the first. In this 
they stand in sharp contrast to the moments about an arbitrary point. 

Both cumulants and moments have another property of an invariant] ve kind, namely, 
that if the variate-values are multiplied by a constant a, and are multiplied by 
This is at once evident from their definitions. Thus any linear transformation of the kind 

^ + m.(3.24) 

leaves the cumulants unchanged so far as the constant m is concerned and multiplies 
by V. The sole exception is the first cumulant which is equal to the mean. In par- 
ticular, if we transform a distribution to standard measure, the only effect is to multiply 
Kj, by a'**'’, a being the standard deviation and, as we shall see in a moment, being equal 
to K^, 

Tli»e invariantive properties of the cumulants was the origin of their original name 
of semi-invariants, seminvariants or half-in variants (Thiele, 1889). It has, however, 
recently been shown that there are several other classes of parameters with the same 
property, and it seems best to reserve the word “ sem in variant ’’ for any parameter A,, which, 
under the transformation (3.24), is multiplied by f. The cumulants and the moments 
about the mean are thus particular cases of seminvariants. 


Relatims between Moments and Cumvlants 


3.13. Subject to conditions of existence, we have, from (3.22) 
1 /'ij, 4 . . . + f ■ . . 



Picking out the terms in the exponential expansions which, when multiplied together, 
give a power of f, we have 





62 


MOMENTS AND CUMULANTS 


where the second summation extends over all non-negative values of the ti’s such that 

-f + . • . Pm^m .(3-26) 

It is worth noting that the rather tedious process of writing down the explicit relations 
for particular values of r may be shortened considerably. In fact, differentiating (3.22) 
by Kj we have 

+ ... +tf + .. .)-!»<+... +"^'+ ■ • ■ 

and hence, identifying powers of <, 

. 


(3.27) 


In particular 


(3.28) 


and thus, given any /4 in terms of the /c’s we can write down successively those of lower 
orders by a differentiation. 

The first ten of these expressions are, for moments about an arbitrary point:— 


/^2 == ^2 + 

= '<’3 + 

x= #0^ + 4/C3K1 + 3/c| + 6 k 2 *c f + 
jWfi ~ /C 5 + 5K^Ky + + lOK- 3 /cf + 15 /c|k:i + IOk^k^ + k'I, 

Mq = #^6 + + IOkI + 60/<r3#C2^i + 20k^kI 

+ 15/f:] + + 15/^2^^ + 

*= /C7 + 7/C(j/Ci + 21 k^K 2 + + 35#f4K3 + 105 k4/C2#Ci 

+ + 7 OAC 3 /C 1 + 105/f3fC2 + 210/C3fC2#Ci + 

+ J05/C2^i 4* 105k2#<i + 2 IK 2 K 1 + K’j, 
as: Kg 4 4 ^HkqK^ 4 4 56Kt^K^ 4 168K‘;^«2/<r| 4 

4 35^4 4 280/<'4/f3Ki 4 2IOK4IC2 4 420/c4K2#fi 4 '^0k^k\ 

4 280k?K2 4 28()«:?/c? 4 840/C3ii:SKri 4 56O/C3K.2K? 4 
4 105k\ 4 420K.2fc^ 4 210 «'^k} 4 4 

= K9 4 ^^8^1 4 36a<'7#c 2 j- 36K7f<'*| 4 ^4K*g#c3 4 252 k-^}K2/C4 

4 84/c^*k:4 4 126k^k4 4 504^^3/^! SlSKr^i -f I^^k^k^Ki 
4 126k^^/C 4 4 315/f4AC| 4 1260lC4K’3#f2 + 1200f<:4A^3/Ci 4 1890^^4#C2^J 
4 1260 k4/C2/c? 4 126/Ci/cJ 4 28 o 4 4 2r)20K2K2/c'i + 840 k^^/<J 
4 12G0/c3#c2 4 *S 1 SOk-^k^Kj 4 1260#C3/C2Kj 4 84k3kJ 4 945^2^1 
4 1260/<2/<r’J 4 4 4 

fi'lQ 3 = Kjq 4 lO/CyK^ 4 ^^KgfCij 4 45K:g/Cj 4 I2OK7IC3 4 360/C7/f2'<^l 


45/Cg/Cj 


120 /^ 71 C 3 


360/C7/fo«:i 


4 120/c7k^ 4 210/Cfiic4 4 840K'gK3#cj 4 GBO/cgKg + 1260/<‘gK2ic£ 

4 210/<f;K} 4 126 /C 5 "f 1260^^4^1 4 252 OK 5 K 3 K 2 4 2520/c5;c3Kj 
4 3780k., 4 2r)20K5K2K? 4 252 k.,kJ 4 1575KfK2 4 1575K|Kf 
4 2 IOOK 4 K 3 4 I 2 GOOK 4 K 3 K 2 K 1 4 4200k4K3k*][ 4 315 OK 4 K 2 
4 9450k ^k 2 Kj 4 315 OK 4 K 2 K 1 4 2 IOK 4 KJ 4 28 OOK 3 KJ 
4 6300k^k| 4 12600k‘^K2kJ 4 2100k|k| 4 12600k3K^Ki 
4 12600k3k2kJ 4 252 OK 3 K 2 K? 4 I 2 OK 3 KI 4 945kS 4 4725kJk? 
4 SISOk^kJ 4 ^S0k14 4 45K2Kf 4 ; 


(3.29) 



RELATIONS BETWEEN MOMENTS AND CUMULANTS 


63 


or, for moments about the mean (ki == 0), 

= Xt, 

/it — Xt, 

fit = «■* + 3/f|, 

yli, = #Cj + 10K,«r„ 

fit =» K, -f I 6 /C 4 K, + 10 k§ + 15k|, 

fit 5= K, + 21fC|K| + B^KtKt + 105/f,»C2> 

fit = K« + 28(c,ic, + 66ic,#c, + 36 k 4 + 210/C4»c| + 280K3>fg + 106/c*, 

fi, = /f, + 36/c,»cg + 84 k,/c, + I 26 KJK 4 + 378k:, 4 + 1260/f4*f,»c, + 280/f? 

+ 1260/Ca»C2> 

/lit = «10 + 45«:,k, + 120k,K, + 210k,k4 + 630k,k| + 126Kf 
+ 2520KaKaKa + ISTSk^k* + 2 IOOK 4 K 3 + 3 I 5 OK 4 K 2 
+ 6300k;V2 + 945k|. 

Conversely we have 

+ . . . + 5^' 4 . . . . = log (1 - ' 


4 - 

rf 


(3.30) 


Expanding the Ic^arithm and picking out powers of f as before, we have 

the second summation extending over all non-negative ri’s and p’s, subject to (3.26) and 
the further condition 

ni + Tit + ... n„ = p . . • • • (3.32) 


The first ten formulae are, in terms of moments about an arbitrary point;— 

Kj = fl\, 

Xt = fi — /ii\, 

Xt ^ /h — 3p2/<’i + 

Xt =/'t — V.i/'i — 3/4^ -f 12/4 p',® — 6p'/, ^ 

Xi — /4 — 'Vhf \ — ^^/h/i‘t + 20/4/^’i'‘* + 30/^2 Vi “ ^Opg//,® + 

x» ~ /'o — ^f ;,/'i — 15p4/t2 + 30p4p',® — lOp',* + 120 p 3 /t 2 /*i — I 2 OP 3 P 1 * 

4- 30p2'' “ 270p2’*Pi^ 4- 360//Vi^ ~ 120pj®, 

Xt = P7 - 7p,;p', - 21/4^2 + “ 35pV3 + 210^4^2/^1 

- 21 ()p 4 p,^ -f 210/4/4'* ~ 1260p3/^2Pi'* + 840p3//'4 

- 63(»P2'Vi + 2.72OP2V? - 2.720/4/4'> + 720//i'4 

Xt = /^« - 8/v',/4 - 28 //;iP 2 4- r) 6 /«„p ',2 _ 4 . SSGf/^fi^fi^ ! ^ (3 33 ) 

- 336/4pi* — 3 . 5/442 4 560/X4/43PJ -f 420ptp[/ — 2520p4p'2i«i* [ 

4- 1680/4/^4 4- 660//:;2p2 - lesopsV/'* - •'> 040 ^ 3 ^ 2 ’*/^'! 

+ 13440//3P2p' 4 - 8720^;,//',* - 630p2^ 4- lOOSO/^'g'V^** 

- 2r)200//22p’/ + 20I60/4/<'4 - 6040/4», 

Xt = /“» — 0/«k/<’i - 36p44 4-^72p',/4‘ — 84/16/^3 4- 504/4pi/xi 

- 504pgp'j''* — 126p^fi\ 4- 1008p44/^i 4- 

4- 3024p3/4^ 4- 630p4‘4<i 4- 2520ptp^p.'^ — 7660^^3/*!* 

- 11340p4^2Vi 4" 30240p4/«2/<i* — 15120pVi* 4- SOOp’j® 

- 16120p3'V2/‘'i 4- 20160/44 /i^ - 7560/4^2’'* 4- 90720p3piV? 

- 151200m;m2/«'4 4- 60480m.>,'® 4- 22680fiJfi, - 1512obp2Vi® 

+ 272160^2Vi* - I8I44OP2V4 4- 40320pi®, 



64 


MOMENTS AND CUMULANDS 


«x» “ iWio” — 46 jU8^'2 + 90— 120/4^3 + 

- 720 ~ 210/X6/xi + 16S0fiQ/iaHi\ + l^OO/i'^fii^ 

— Tseo/tji/x'jj/xj® + 6040/X6(tt'/ — 126^5® + 2520^5j«i//i 

+ 00^0(i^fi^(i^ — 15120jMj;«’3;M’i® — 22,090 + 60480/Xj/X2j«i® 

- 30240/x;/x;‘^ + 3160/x;®/x;v- 9460^;®/ii® + 4200/X4^;® 

- 'J5000fJ,^fl^l2|n■l + lOOSOOyU^/Xsyu’x® — ISOOO/ii/i^* 

+ 226800/<ijU2®/Xi® — 378000/i4/x”2j“i* + 161200 jm4/<’j® —16800^*3*/*^ 

- 37800;x3®^2®+302400/X3®/i2A‘i*-252000/i'3®// + 302400/i3/i;®jwi 

- 1512000/*;^®/«;® + 18i4400/it'3/<'2//;® - 604800/«3 /x/ 

+ 22680M2® - 667000m;V i+ 2268000«;Vi* - 3176200 m;‘V,® 
+ 1814400iaVi® - 362880/xi‘®. 

or, for momenta about the mean, 

K, = fit, 

K, = fit, 

Kt =r fit — 9fi\, 

K( = Ml — 

K» — Ml — ^^MiMt ~ + 30//2I 

Kt = fi, — 2\fitfit — 35//,/x, + 210;a3^|, 

Kg ==z fig — 28figMt — ^^MiMi — 36//| + 420^4/<2 + 660/<V> ~ 630^2» 

Kg = Mt —OOmiMi “ 84/x,/x, — 126miMi + TOOmimI + 2020fitM»M» 

+ 56 O //3 — ISOOfigfil, 

Kig ~ Mil— ^OfigMt — 120/i,/x, — 2l0figfit + \200figfi\ — \20fil 

+ BO^OmiMiM* + 3150/xV* + 4200/X4/X3 — 18900/i4;M2 

- 37800//VI + 22680^:]. 


. (3.33) 


. (3.34) 


Existence of Cumulants 

3.14. The formal expression (3.22) may be rep;arded as defining the cumulants in 
terms of the moments, and it is thus evident that the cumulant of order r exists if the 
moments of orders r and lower exist. If, however, we look to the equation 

exp = <l>(i) 


as defining the cumulants, it is not quite vso easy to show that k^, exists if fx^ and lower 
^’s exist. It may, however, be shown that exists if r,., the absolute moment, exists, 
and this is sufficient for all ordinary purposes. Some care is necessary with the proof 
because the variable t in the characteristic function is real, but there also appears the 
complex quantity i. 

We have 



(IF. 


Expanding the exponential we have, if the moments up to exist. 



CALCULATION OF CUMULANTS 


65 


where 


= J (IF^cos xt + i sin xt — 

Considering the real and imaginary terms separately, we have, if r is even— 

5 , _ 1“ (co.», - !■(-,). X, - j;( - 

The rek term in the integrand consists of (- plus cos xt minus the first ^ terms of 


the Maclaurin expansion of cosa?^, and is thus equal to 


(xty/ di 


r\ 


\dxi 

xt 


cos x 


where 




0 < 0 < ]. The modulus of the term is thus not greater than 2 . Similarly for the 

imaginary terms. Hence 


I if, I < sj” 


I I" 


(IF 


r\ 


< 3v, 


nr 
r! ■ 


A similar result follows if r is odd. Now if /<', exists only as a principal value, it does 
not necessarily follow that r', exists. But if the latter exists we have 


m - n 0 (r). 

;-0 


We may then, for some small take logarithms and expand, obtaining 


log + 0(r) .... (3.3.5) 

the coefficients being the cumulants by definition. Hence if exists, and curnulants 
of lower orders exist. 


(Jalcuhtion of Cumulants 

3.15. Tlie cumulants are not, like the moments, determinable directly by summatory 
or integrative processes,^ and to find them it is necessary either to ascertain the moments 
and then a])ply equations (3.33), or to derive them from the characteristic function. For 
the latter case we have, from (3.35) 

K, = (~ i)’j^7)pog ^.(3.36) 

The following examples will illustrat/e the processes involved. 

A.S.—VOL L 


F 



66 


MOMENTS AND CUMtTLANTS 


Example 3,8 

In Example 3.7 we found the following values for the moments about the mean of 
the height data of Tabl^ 1,7 :— 

= 11-020,850 

pt == 6-616,805 

^ 0-207,840 
fx^ = 137-689,185, 

whence, from (3.34) atj and have the same values as and fx^ and 

#C4 = /i4 — 3/^a2 

== 6-342,86. 

Kt is the same as //'j, in this case measured from the centre of the interval 66- inches. 

The same results would, of course, have been obtained if we had used equations (3.33) 
and moments about the origin. 


Example 3,9 

Consider the discontinuous distribution whose frequencies at the values 0, 1, . . . j . . 
are Ji’ ‘ * * jl ‘ characteristic function is given by 




= e exp 
= exp m(6" — 1). 

Since for any r the absjDlute moment is the same as the ordinary moment, we have 




Zj j\ * 


J^o 


and since this converges * cumulants of all orders exist. They are therefore given by 
the expansion of log as a power series in f. But 

log (/>(t) =■ m(e^^ — 1) 

and hence Kj. ^ m 

for all r. Thus all cumulants of the distribution are equal to m. 


* For the ratio of the (n -f l)th term of the series to tho nth is 
^n+i(^ -f 1)^ /mVn^ m 


m / 1 \^ /m\ 


(n + 1) 

and thus the series converges for all finite values of m, 




CALCULATION OP CUMULANTS 


67 


Emmph 3J0 

In Example 3,4 we found, in effect, for the characteristic function of the normal 
distribution 


dF =- € dx 

ax/(2^) . 

^(0 = —7^^ f e^^e- io-i dx 


= e 2“ 


log 4(() 


“o'" 


It is easily seen that the absolute moments and hence cumulants of all orders exist. Thus 
(ity 

Ky is the coefficient of in log <^(0, i.e. for the normal distribution all cumulants of order 
higher than the second are zero. The second cumulant is equal to oK 


Example 3J1 

In Example 3,6 it was found that for the distribution 


dF = 

Hr) 

the characteristic function is given by 

<!>{() ^ 


a > 0, 7 > 0 

0 <; a: < CO 


(‘-r 

It is readily verified that cumulants of all orders exist and hence 

= coclf. of in - y log 


^ y(r - 1)! a-\ 


Example 3.12 

Consider again the distribution of Example 3.3, 

1c 

dF = dx m > 1, — 00 < a: < 00 . 

(1 -f- X ) 

The characteristic function is given by 




1 


e>xri 


(1 + X^Y^ 

which, since sin xt is an odd function, reduces to 

cos xt 


dx^ 


c. 


(1 + x^Y 


dx. 


This integral may be evaluated by complex integration round a contour consisting of the 



68 MOMENTS AND CUMULANTS 

*-axis, the infinite semicircle above the a:-axis and the infinitely small circle round the- 
point X = i. It is found * 

JcOT f 

^ I ~ I ^ I 

+ (”» ^ - l )(m - 2) (g U I )m-s + . , . 4 . - 2)!) 

2! “ ' ‘ * (w — 1)! j" 

If r < 27n — 1 the absolute moment of order r * 

• ’■ J_»(l+a-T* 

exists and hence so does the cumulant of order r. But in this case we cannot expand 
log in an infinite series of powers of t, though this might perhaps be thought possible 
from the form of 4>{t). In fact, we can only expand log 4,(t) in powers of t up to the point 
at which the differential coefficients of <l>(t) exist, for t = 0. 

To simplify the discussion, consider the case when m = 2. We have then, since 
k = 2/71 in this case, 

^(<) =c-i'i{|<| + 1) 
log <i,(t) = — I < 1 + log {1 H- 1 < J} 

If < is positive this equals 



but if < is negative it equals 

2 2 • • • 

the two expressions differing in the sign of the term in and every second term thereafter. 
There is thus no unique expansion of log </>{() in powers of i about the point / == 0. There 
are two forms of the function expressing log <^(<) according as I is positive or negative. 

However, these expressions coincide as far as their terms in t and and the first 
and second differential coefficients of log are uniquely defined when t — 0. Thus 
the first and second cumulants exist, and are given by 

Kj — 0 

K, = 1. 

Cumulants of higher orders do not exist. 


Corrections for Grouping 

3.16. When moments are calculated from a numerically specified distribution which 
is grouped, there is present a certain amount of approximation owing to the fact that 


* Rosults of this kind are givoa in sovernl text-books of analysis, sometimes incorrectly, e.g. it 
is sometuuos stated that 


i: 


008 tX 


dx 


which is only true when t % 0. Tho appimiatice of the modulus in tho expression above is crucial 
for the purposes of the example. A correct proof is given in J. Edwards, Integral Calculus, voL 2, 
aiticle 1326. 



CORRECTIONS FOR GROUPING 


69 


the frequencies are assumed to be concentrated at the mid-points of intervals. It is possible 
to correct for this effect under certain conditions. 

Suppose the frequency function f{x) to be continuous. If the range is divided into 
intervals of width h, we are given, not the values of f{x) at all points but the frequencies 
in those intervals, e.g. the frequency in the jth interval, centred at x^, will be 

ft = + f) di. 


We will denote the moments calculated from grouped frequencies—the raw 

moments—with a bar, so that wo have 

7-- -TO 

== M .(3-37) 

The true moment, if it exists, is given by 


and it is required to invcKstigate the relationship between the /I’s and the /^’s. 

Now we have, in virtue of the Euler-MaeJaurin sum formula, for an arbitrary function 
k{jc) which has derivatives of the ?wth order, 


1 fa 4 nh 

k{x)(Ix = + K(a f- k) ] K(a + 2k) -f 


•4“ k(^(1 71 — \h) *"(“ ^K^(Z -j” 7th)): 


r nai-nh 

iTo J' L Ja 


where is a remainder term which may be expressed as 

/}i 7i ai 

8„, = - . + Onh) 0< 0 < 1 


. (3.38) 


m even, 

and 8„ = - pR';*, + Onh), 0 < 0 < 1 

ml 

m odd.* 


Suppose now that /(.r) is of finite range, from a to 6, derivable up to the mth order, 

I 

Of. Milno Thomson, T'he Calculus of Finite Di fferences, .section 7.5, for the general Kiilor-Maclaurin 
expansion. The form of when m is oven is given in section 7.5 of that book, and the above form 
when m is odd may bo derived similarly. 

In our convention the Bemouilli number is defined as the coefficient of t^/j\ in t/(e* — 1). The 
Bemouilli polynomial hew already been defined in 3.9. Explicitly Bq — 1, = J, ™ J, ^ 

«“ ^ 2 j+l « 0, B 4 = — Bq B,o 2^30’ — g. 



70 


MOMENTS AND CUMULANTS 


and that at the end of the range /(a;) and its first m derivatives vanish. Then /(«) and 
the first m derivatives are continuous throughout the range — oo to + oo and the function * 

ft 

<c(a:) = j* ^/(a: + I) df.(3.39) 

"■a > 

together with its first m + 1 derivatives, will also be continuous throughout that range. 

If a is infinite (and similarly for 6) it is assumed that 


lim 




for all values of j up to and including m, in which case k(x) and its first m + 1 derivatives 
will also tend to zero. Thus in either case the Euler-Maclaurin expansion (3.38) is valid 
for k{x) given by (3.39) and we may write 



j < m + 1. 


Substituting in (3.38) we have, since k( — oo) = #<(+ oo) = 0, 



— Mr ^m+1 • . r . . (3.40) 

The integral on the left of this expression is equal to 

h 

^ +f) .(3.41) 

~3 


provided that the multiple integral exists. If, in addition, it is absolutely convergent wo 
may substitute x for x + ^ and integrate with respect to f. We shall then have 

h 

Mr —Sm+1 = j'* ^ (a: — f)’/(x) di dx 





/tV+i 

V 


»• -f-1 



Mr - 2/ • 


. (3.42) 


"rHI T 

where - is the integral part of 



CORRECTIONS FOR GROUPING 


71 


Thus if may be neglected, (3.42) gives the raw moments in terms of the actual 
moments. In practice we require the latter in terms of the former and it is easy to find 
from (3.42) the following expressions;— 


Hi =■ Hi 

1 I.* 

Hi — H'i ~ 

Hi Hz — ^^Hih* 


Hi — Hi ‘— H" A — 

, , 5 , 7 , 

1^5 == As 

Hz = Ha — '-H4h^ + 


_31_ 

B44 



. (3.43) 


The general expression for these formulae is 

^; = ^{Q(2‘-'-l)fi>’/X->} . . . .(3.44) 

where Bj is the Bcrnouilli number of order (Cf. Wold, 1934a.) 


3.17. These are the corrections known as Sheppard’s. It is important to realise 
the conditions under which they were obtained. 

(a) Tt is assumed that/(.r) is bounded and tends monoionically to zero in the directions 
in which the range is infinite, 

(b) Tt is assumed that the multiple integral (3.41) is absolutely convergent. This is 
equivalent to supposing that the absolute moment of order r exists. If f(x) is finite in 
range and bounded, tlie multiple integral is certainly alj^olutely convergent. If the range 
is not finite, since f(x) tends to zero monotonically in the direction or directions of infinite 
range, 




dS dx 


will converge or diverge with 


h 

iiO’. 

3 

i.e. with I 1 I f{x) dx 

J —OO 

which is the absolute moment of order r. 

(c) It is assumed that f{x) and its first m derivatives vanish at the terminal points of 
the range when the range is finite, or that 

lim 0 

for all j up to and including r when the range is infinite. 

(d) It is assumed that is negligible. 



72 


MOMENTS AND CUMULANTS 


Now both and less than in magnitude * and hence ^9^+1 order 

- —— multiplied by some value of in the range. Thus if is small, the range is 

finite and/'”V) sniall; will be small and may bo neglected. In particular, if 



< m 


. (3.45) 


the Sheppard corrections will give the moments accurate to order i.e. to the order 
of the terms applied in making the corrections. 


3*18. The foregoing discussion is rigorous, but the corrections may be applied in 
practice with considerable confidence whenever there is high-order terminal contact. 


Example 3J3 

Consider the distribution 

dF = ~x)^dx 0 < x < 1, 

a case of the so-called Type T distribution. The exact frequencies for intervals of 0*1 
may be obtained from the Tables of the Incomplete jB-Function, and are as follows :— 


Centro of Interval. 
0 05 
015 
0-26 
0*35 
0*45 
0*55 
0*66 
0*75 
0*85 
0*95 


Krc'qunncy. 
0-000,000*0 
0,000,009,2 
0-000,640,8 
0-009,938,2 
0061,1,37,4 
0*192.199,6 
0-332,887,7 
0 207,479,9 
0101,03,3,7 
0 004,667,5 


Total. 1 -000,000,0 

The raw moments about x ^ 0 arc shown in the following table :— 


j Moment, 

Haw. 

Exact. 

Corrc-ctiul. 


0-6*>(>,602.8 

0*666,666,7 

0*666,(;62.8 


0-466,905,5 

0-450,140,4 

0*456,132,2 


0-320,952.3 

0-319,298,2 

0-319.286,7 


0-230,335,1 

0-228.070,2 

0-228,063,2 

1 

0 IGS.512,9 

0 16.).869,2 

0-105,848,0 

) 1 

0-125,433,2 

0-122,699,0 

0-122,574,0 


♦For B 2 jh.i = 0, i 

(cf. Milne Thomson, loc, cii,) 
and further 


> 0 



U’V 

n«*l ' ' 


(2jrV* 















CORRECTIONS FOR GROUPING 


73 


The exact values of the moments are calculable by evaluating integrals of the type 

I --xf (lx and are shown in the third column. The final column shows the 

Jo 

results obtained by applying equations (3.43), e.g. 

/^2 =- fH - AV12 

= 0-450,065,5 0-000,833,3 

= 0-456,132,2. 

At the terminal = 1, f{x) and its derivatives up to the fourtli vanish. At the other end, 
derivatives up to the tentli vanish. The function is bounded, of finite range, and the 
derivatives remain finite throughout the range. In virtue of (3.45) it is to be expected that 
corrected moments of third and lower orders will be accurate to the order of the terms in 
the corrections, i,e. accurate to order A^/i2 (0*001 )and to order /t®/4(0-0001). Actually 
tht^y are considerably more ac'curate than thi.s. The corrected fourth moment is in error 
by a term of order 2 x 10 and this is of the same magnitude as the correcting term 
gjy/i^used in arriving at it. Similarly the corrected fifth moments are in error by a term 
of order 10“®, of the same order as one of the correcting terms to the filth moment, 
7 

and of the same order as or greater order than two correcting terms to the sixth 

7 . 31 

moment, - uj\^ and — — 

Ifr * 1344 

Thus the coiTected moments are in all cases a substantial improvement on the raw 
moments ; but in applying the corrections it is necessary to guard against being miskxl 
about the accuracy of the final result by the apparent precision of some of the small 
corrective terms. 


Example tiJ4 

As an illustration of the way in wdiich Sheppard’s corrections break dowm when the 
condition for liigh-ordcr contact is violated, an example is taken from a paper by Pairman 
and Pearson (1918). Th(‘ following table shows the frequencies in a certain range of the 
normal distribution 


1 

dF = e dx 
V2n 


with intervals of wddth 0-5. 


Inlerval controd at 

1 5 

2 0 
2 5 
30 
3*5 
4-0 
4-5 
50 


Frequency, 
. 0*655,91 

. 0-278,34 

. 0092,45 

. 0-024,02 

. 0004,89 

. 0-000,78 

. 0 * 000,10 
. 0 * 000,01 


Total 1-056,50 

The distribution has high-order contact at one end but not at the start of the curve, 
being in fact J-shaped and very abrupt at that point. 

The following table shows the raw moments about the mean up to the fourth order, 
















74 MOMENTS AND CUMULANTS 

the moments with Sheppard’s corrections and the true moments calculated from the 
continuous normal distribution ;— 


Moment. 

Raw. 

Exact. 

CorreeWd. 


0 168,524 

0-104.226 

0-149,090 

0172,222 

0098,612 

0156,405 

0 137,691 

0 104,226 

0131,097 


It will be noted that in the two cases where the corrections are made they operate in the 
wrong direction. For the fourth moment they increase the difference between calculated 
and true values from about 4 per cent, to about 16 per cent. It is clear that, at least for 
the fairly coarse grouping of this example, Sheppard’s corrections may fail completely. 

3.19, Equations (3.43) were written in terms of moments about an arbitrary point. 
This point can, in particular, be the mean of the distribution, and accordingly we may 
drop the dashes and put equal to zero in (3.43), to get the corrections appropriate for 
moments about the mean. 

3.20* The discussion of the Sheppard corrections up to this point, and Examples 
3.13 and 3.14, have supposed that the given frequencies were those of a distribution 
which was exactly specified by a continuous mathematical function. In practice this 
case very rarely occurs, the most common necessity for grouping corrections arising when 
moments are calculated from tables such as those of Chapter 1. For such tables it is 
not possible to state categorically that the corrections will result in an improvement; 
but there are usually strong presumptions to that effect. Consider, for example, the 
height data of Table 1.7 (Example 3.7). There can be no doubt that the histogram provided 
by this material can be graduated by a smooth curve and that such a curve will give better 
values of the moments than the histogram. Moreover, the tailing-off at the extremes of 
the distribution supports the assumption that the conditions for terminal contact are 
satisfied. It may therefore be confidently assumed that Sheppard’s corrections as applied 
to the grouped data will give improved values for the exact values of the moments which 
would have been derived from the ungrouped data had they been available. 

Average Corrections 

3.21. There is a distinct type of problem which also leads to the Sheppard corrections. 
Suppose there is given a distribution of imknown range and the frequencies falling into 
specified intervals, one may ask what are the corrections to be applied to the raw moments 
so as to bring them on the average into closer relation with the real moments. In other 
words, supposing that the interval-mesh is located at random on the distribiitiot}, what 
are the average values of the raw moments ? 

Let Xy be a fixed set of values of x, j varying from ■— oo to oo by integral values. As 



By definition 



h 





AVERAGE CORRECTIONS 


75 


Denoting by Eifir) the average as Xj varies from X a to X we have 

'"2 '+2 


E{ii‘r) 




/(•»•/ + f) (IS dx, 


n 


(3.46) 


which is the same as equation (3.40) with the omission of and the substitution of 
E(fir) for /ly. Thus the Sheppard corrections apply for the average group-moments 
whatever the nature of the terminal contact. 

They cannot, however, be applied indiscriminately on that ground. In place of the 
conditions about terminal contact, which ensure the applicability of Sheppard^s corrections 
to any particular distribution, there is the condition that the grouping intervals are located 
at random on the range, which implies tliat although the corrections may be wrong in any 
given instance, the average effect in a large number of cases will be correct. In actual 
fact the condition about the random location of grouping does not operate very frequently 
for J- and U-shaped distributions, where the Sheppard corrections would not ordinarily 
apply ; for instance, in a distribution of incomes or deaths at given ages it is almost inevitable 
to begin the grouping at zero. 


3,22. It is also illegitimate to drop tlie dashes in order to obtain corrections for 
momentKS about the mean. If the mean of the grouped distribution is denoted by y, the 
average value of the rth moment about the mean is given by 

h 

E{p.r) = jlj {JC — yf ^ f{x + <?) ds dx, 

~ 2 

where y is a function of x and the transformation of the integral which has been used earlier 
in this chapter is no longer legitimate. Explicit expressions for average corrections to 
moments about the mean have not yet been obtained. From a consideration of some 
particular distributions, however, Kendall (1938) concluded that for all ordinary purposes 
it is sufficient to use equations (3.43) as if the mean were a fixed point. 


3.23. The Sheppard corrections have also been considered from a slightly different 
point of view (Fisher, 1921). As the centres of the intervals move along the variate axis, 
the raw' moments vary according to the different groupings which result; and this variation 
is evidently periodic of period h. We may thus write 

Ar = X* ^ j A 



76 


MOMENTS AND CUMULANTS 


where 

and may put this equal to 


= (j 


Ao + Ai sin 0 + A, sin 25 -f . . . 

+ By oos 5 + JBj cos 25 + . . . 

Then, multiplying by 6ins5 or coss5 and integrating from 0 to 2n, we have 

, h 




%n 

sin sd dO 

0 


2n 

COS sO (10 


r '^2 

J .{'/w 

i ,C’/W 


dx 


dx 


and in particular 


1 vn 




£71 


Since dO = dC, we have 


1 fo-im ft-^2 

2 


which is the same as (3.41) and (3.46), and thus leads to the Sheppard corrections. 
For the periodic terms we have 


" yi? "C^ 


dx 


'"2 

For some mathematically specified distributions we are able to consider the magnitude 



SHEPPARD’S CORRECTIONS TO FACTORIAL MOMENTS 


77 


of those periodic terms. For instance, for the normal curve referred to the true mean 
we have, since 


. 27t8C 

. C«n-^ « 

h r"® 
ns] 


ns 


h 2n€x 

--cos cos m 

ns h 

2 nsx 1 7 

cos —T”. — T^e 2<y* ao? 


A oV 2n 


1^8 — 

where lA^ and refer to the coefficients for the corrections to the mean. The grouping 
error of the mean is thus 

h( \ 

— ~Ve sin 0 — ^*'8^20+ . . . etc./. 


n 


For a grouping in which a = A (a very coarse grouping) this is, approximately,-sin $ 

71 


and thus cannot be greater than — e 

n 


.2j»* 


3.24. Average corrections may also be applied to discrete data which have been 
grouped in wider intervals but are different from those of the continuous case. Cff. Exercise 
3.13 and C. C. Craig*(1936). 


fihepparcVs Corrections to Factorial Moments 

3.25. It has been shown by Wold (1934o) that for factorial moments the Sheppard 
corrections are as follows :— 

A^[)l ~ /^ii] 

IC 

/«!?] — — — 


A*.- 


;j3 


j«[3] — Ppi — + 4 

^ , 3 -. 71,, 

A*[4] ~ P'14] — —-h 


PlS] 

P[0] 


Pm — yPisj^® + 2 ~ 


, 5 , 

PtO] ~ ^P[4]^’' + ‘’P[.i]^'“ 

, 93 9120,. 

•4- ““Miiia -—— H 

^ 448 


213 , 


. (3.47) 


and in general are given by 

P'[r] = 




. (.3.48) 



78 


MOMENTS AND CUMULANTS 


where the Bemouilli polynomial ia equal to 

and 


j> I 


Sheppard’s Corrections for Cumvlants 

3.26. As in section 3.16, and under the same conditions, we have, writing 6 for it, 

{"f ./(*<+««} “ I /(* + «“« 

h 

= ^ df J €f^^f{x) dx 


2 

. .6h 
sinh 

2 

Qh 

“2 


f €^^f{x)dx .(3.49) 

J —00 


The expression on the left gives the characteristic function for the grouped data, and the 
integral on the right the true characteristic function. Taking logarithms of both sides 
and noting * that 

sinh--^ 

Qf - Zj r\r 

r«2 

2 


we have, for the coefficient in 


r!’ 




BJir ^ , 
- , r > 1 

r 


, (3.50) 


an attractively simple result for the Sheppard corrections to cumulants. Since aU 
of odd order are zero except B^ and the first curaulant is equal to the mean, no cumulant 
of odd order needs any correction. For the others we have 




K4 = + 


12 


Ke = /Ce — 


252 


. (3.51) 


♦ By definition 


and hence 


0 ^ ^BrO^ 

d' — 1 ,Lmj r\ 

r-O 

e»-ri e'^ r\ ’ 


integrating from 0 to 0 we have the above result. 



MULTIVARIATE MOMENTS AND CUMULANTS 


79 


Grouping Corrections when the Distribution is Abrupt 

3 •27. Various writers have considered the corrections to be applied when one or 
both terminals of the distribution do not obey the Sheppard conditions for terminal contact. 
References are given at the end of this chapter. 


Multivariate Moments and Cumulants 


3.28. The foregoing results in this chapter may be readily generalised to the multi¬ 
variate case. To save complicating the algebraic expressions, we shall deal with 
two variates Xi and ; but the reader will have little diflSculty in carrying out any 
generalisations for more variates. 

The bivariate moment about an origin Uj for Xi and a* for x^ is defined by 





{Xi — aiy{x2 — Ua)* dF 


(3.52) 


If one of r, s is zero the moment becomes the ordinary univariate moment of the row or 
column-border distribution of the bivariate population. In the contrary case we meet 
a new type of moment—the product-moment. The first product-moment /i j ^ is of particular 
importance in the theory of correlation. The fii'st product-moment about the variate 
means, //j,, is known as the Covariance. 

As in the univariate case, bivariate moments about certain points can be expressed 
in terms of those about other points. If the Xi origin is transferred from Oi to bi where 
=. and the x^ origin from to 6,, where Cj = 6* — ^ 2 , we have 


aj) == (yw' + CiY(ju' + CiY .... (3.53) 

where the product right is to be replaced by p^(bi bi). This corresponds to 

the symbolic equation 

//^(a) - {/.'(6) + cf 

for the univariate case. 

Methods of calculating the product—moments for numerically specified distributions 
will be considered in Chapter 14. The determination of bivariate moments for a mathe¬ 
matically specified poj)ulation is a matter of evaluating double sums or double integrals, 
and no new statistical points call for comment. 


Example 3,15 

The bivariate distribution 

Let us evaluate :— 


CTiOTa 



h) 



e'‘'> dF. 


— 00 < < 00 . 


i —Xi — a\ti — pajOtij 

Tj =Xt — p<Tj<r,«i — altf 


Making the substitution, 



80 


MOMENTS AND CUMULANTS 


we find 

^J) a= 63Cp jflTj -f- H^%OiOfp "t~ ^ 

1 




2:r<TiCTj(l — p*)ij 

= exp{|(«f<rf + 2p(7i0,<i<, 4- 

Now ie the coefficient of - J in M{txy h) and thus we find, for instance, 

/<ao =* CTj, ^11 = pOiO^, //02 = 

/^ao P 21 = ^ ^03 = fi 

//40 ~ 3ut, /iai = 3p<T?<7a, //*t == (1 + 2p*)(yf(T|, 

^,18 = 3pcyi(72, flu = 30*2. 

3.29. The bivariate analogue of equation (3.22) may be xvritten 

''V 


or symbolically, 


where 



+ . . . 


“ 1 4* 



iiof 

Oil! 

exp 

{ 4 '«' 

+ 

4- 

II 

KfX^ 

plot 


MrUftfS I. 


^p-\ y 1 

\v - l)!l! 

In terms of characteristic functions we may define 


and, as before, v^Tite 


j —00 j —00 

<f>{tij tz) 

r,“() 

Zj_ 's\ J 


= i: 

r, 8=0 


(ifl) r 
rl .<.>! 


= exp 


t, S-^” 1 


(3.54) 

(3.55) 

(3.56) 

(3.57) 


(3.58) 


subject to conditions of existence. 

From these eciuations the bivariate moments can be exjuessed in terms of bivariate 
cumulants and vic^e-versa. It is also possible to derive bivariate equations from the univariate 
equations by symbolic processes (cf. Kendall, 1941). 

3.30, Wold (19345) has given the following expressions for Sheppard corrections to 
bivariate moments and cumulants, the variates being grouj)ed in intt^rvals /?i, A*. 


f^o fcTo ' ' 


. (3.59). 



MEASURES OF SKEWNESS 


81 


In particular 

/»20 “ P-M “ Mil “= Piv Mo2 — Poi ~ J 

, A* , , , A* 

iWito = P'so — A*21 ~ symmetrical equations 

, A® 7 , , , A® 

/*4o = Aio — P'io-^ + 2 ^i> ~ Pi^'l> symmetrical equations 


/^22 == M22 — M 20 


Ai 


12 


Ao2~ + 


For oumulants we have 


hfhl 




144 






r, a 

> 0 ' 

^rO = 


_BMx 

r 

r > 2 

■ 



s 

« > 2 


(3.60) 


(3.61 


Measures of Skewness 

3.31. We have considered measures of location and dispersion in Chapter 2. With 
the aid of the moments we can now proceed to consider measures of other qualities of the 
|)opulation, and in particular its departure from symmetry. 

In a symmetrical population, mean, median and mode coincide. It is thus natural 
to take the deviation mean to mode or mean to median as measuring the skewness of the 
distribution. K. Pearson proposed the measure 

_ Mean — mode 
a 


which is subject to the inconvenience of determining the mode. For a wide class of fre¬ 
quency-distributions known as Pearson’s (cf. Chapter 6), this measure may, however, be 
expressed exactly in terms of the fii’st four moments. We define 


^1 




. (3.(>2) 



Then it may be shown that for Pearson curves 


01^ __ 4 3 ) 

2 ( 5 ^, ^ 0 ) 


. (3.63) 


. (3.64) 


and this equation may be taken as defining a measure of skewness applicable to all 
distributions whose moments up to and including the fourth exist. 

The coefficient /?i itself is also a measure of skewness. Clearly if the distribution is 
symmetrical it vanishes since //» vanishes, and the size of ju^ relative to (or V/^i) 
indicate the extent of the departure from symmetry. 

A.S. VOL. I o 



B2 


MOMENTS AND CUMULANTS 


Generally we may define 


t,n+f 


/^2 


. (3.65) 


quantities which are not in general use but will be found to occur occasionally in statistical 
literature. 

More convenient quantities than and for certain purposes are 

/h ^3 


Yi 


x-i 


(3.66) 


>'■ - S - ^ ” 3 . 

/*2 ^2 

If the distribution is expressed in standard measure, yi and are its third and fourth 
cumulants. 


Kurimis 

3.32. In the so-called ‘‘ normal ” distribution 

1 -i 

dF == —- f 2 o‘ dx, — 00 < a: < 00 

aV27t 

/S, attains the value 3 and y^ is zero. Curves for which y^ = 0 are called Mesokurtic. 
Those for which yt> 0 are called Leptokurtic and will, relative to the normal curve, be 
sharply peeked. Those for which y* < 0 are called Platykurtic and will bo flat-topped. 


Example 3,16 

For the distribution of Australian marriages considered in Example 3.1 we found, 
for the raw moments about the mean in units of three years, 

/I, == 7-056,977, /Is = 36-151,505, /I* = 408-738,210. 

With Sheppard’s corrections these become 

= 6-973,644, = 36-151,595, ^ 405-238,888. 

From these values we find 

= 3-854, yi = 1-963 
P^ =: 8-333, ya == 5-333 

indicating considerable skewness and leptokurtosis. 


Example 3,17 

From the formulae for the moments of the binomial distribution considered in Example 
3.2 we find 


Yi == 


Yt = 


9 -P 

y/npq 

1 — ^pq 


so that, as n oo, yi and y^ —► 0. This is in accordance with a result we shall prove later, 
that the binomial tends to the normal form as n tends to infinity. 



MOMENTS AS CHARACTERISTICS OF A DISTRIBUTION 


83 


Momenis as Characiefistics of a Distribution 

3.33. The use of moments and cumulants in determining the nature of a frequency- 
distribution will be abundantly illustrated in later chapters, but some general remarks may 
be made at this stage. 

It has been noted that the characteristic function determines the moments when 
they exist, and it will be proved in Chapter 4 that the characteristic function also deter¬ 
mines the distribution function. It does not, however, follow that the moments completely 
determine the distribution, even when moments of all orders exist. Only under certain 
conditions will a set of moments determine a distribution uniquely, but, fortunately for 
statisticians, those conditions areobeyed by all the distributions arising in statistical practice. 
For all ordinary purposes, therefore, a knowledge of the moments, when the}^ all exist, is 
equivalent to a knowledge of the distribution function : equivalent, that is, in the sense that 
it should be possible theoretically to exhibit all the properties of the distribution in terms 
of the moments. 


3.34. In particular we expect that if two distributions have a certain number of 
moments in common they will bear some resemblance to each other. If, say, moments 
up to those of order n are identical we know that as n tends to infinity the distributions 
approach identity, and consequently we expect that by identifying the lower moments 
of two distributions we bring them to approximate equality. Some mathematical support 
for this so-called Principle of Moments may be derived from the following approach : 

It is known that a function which is continuous in a finite range a to 6 can be repre- 

00 

sented in that range by a uniformly convergent aeries of polynomials in x, say 

n~0 

where P^fx) is of degree Suppose we wish to represent such a function approximately 

n 

by the finite series of powers The coefficients may be determined by the 

V «() 

principle of least squares, i.e. so as to make 

[ (f--dx .(3.68) 

J f( 

a minimum. Differentiating by we have 

‘jfif - I'a^x”)x^ dx = 0 
J a 

f fx^ dx = = f Ea^x"'^^ dx .(3.69) 

J a J a 


or 


If now two distributions have moments up to order n equal they must have the same 
least-squares approximation, for the coefficients are determined by the moments in 
virtue of (3.69). Furthermore, if in the range the distribution /, differs from Ea^x^^ by 
and /a by eg, then fx differs from /j by not more than Ex + e#. 

A similar line of approach may be adopted when the range is infinite, the distributions 
in such cases being, under certain general conditions, capable of representation by a series 
of terms such as e’~''^'PJ^x). (Cf. Chapter 6.) The same conclusion is reached. 

Thus distributions which have a finite number of the lower moments in common wdll, 
in a sense, be approximations one to another. We shall encounter many oases where, 
although we caimot determine a distribution fuijction explicitly, we may ascertain its 



B4 


MOMENTS AND CUMULANTS 


moments at least up to some order; and hence we shall be able to approximate to the 
distribution by finding another distribution of known form which has the same lower 
moments. In practice, approximations of this kind often turn out to be remarkably good, 
even when only the first three or four moments are equated. 


Mean Values 


3.35. To conclude this chapter we may note that the moments are particular cases 
of a general class of functions known as Mean Values, K we have a function y}{x) defined 
in the range of a distribution, then 



. (3.70) 


if it exists, is called the mean value of tp{x) for that distribution ; it is sometimes written 
as E{y}{x)}, a notation we shall often find useful. The moment of order r is thus the mean 
value of and the characteristic function is the mean value of The letter E in this 
connection is the first of the word “ expectation,'" and mean values as we have defined 
them are sometimes known as “ expected ” values, particularly in the theory of probability. 
The objection to this practice is that only rarely is it to be expected that we shall meet with 
the “ expectedvalue in sampling. 


3.36. Two important properties of mean values are to be noted. In the first place, 
if we have two functions y)i{x) and 


J^i dF + J dJ’ = J + ip^) dF 

and thus 

~ ^iWi) "1“ . (3.71) 

i.e. the mean value of a sum is the sum of the mean values. 

Secondly, if w^e have two independent variates x, and distributed with functions 
Fu Fi ; and if is a function of Xi and of then 


or, 


n 


V^iV^2 dF 1 dF f 



E{y>ifi) = E{y}^)E(xpt) . 


. (3.72) 


so that the mean value of the product is the product of the moan values. This is in general 
only true if the variates are independent, whereas (3.71) is subject to no such restriction. 


NOlES AND REFERENCES 

In most of the literature what have here been called “ cumulants " are referred to as 
semi-invariants or seminvariants. They were introduced by Thiele (1889), who, however, 
failed to draw a clear distinction between the parameters of a population and estimates 
of those parameters from a sample, with the result that for some years there was a confusion 
between semi-invariant parameters and semi-invariant statistics. (This is in no way t/O 
be interpreted as a criticism of Thiele, who could hardly have been expected to write fifty 
years ahead of his time.) Some recent work by Dressel (1940) has shown the desirability 
of reserving the name seminvariant ” for the more general class of parameters which 



NOTES AND REFERENCES 


85 


are, except for powers of invariant under transformations of the origin. Dressel points 
out the analogy between such parameters and the functions of the coefficients of the binary 
form 

UqX^ 4” 4“ • • • "f" 4“ • • • 

which are invariant under transformations of type 

f == Zx 4“ w?, y = 

The word ‘‘ seminvariant has been in use for many years in the theory of algebraic 
invariants to denote such functions. The word “ cumulant ’’ is due to Fisher and Wishart. 

A comprehensive account of the mathematical relations between moments, factorial 
moments and cumulants is given by Frisch (1926). 

There is an extensive literature on corrections for grouping. Kendall (1938) gave 
a bibliography which appears to be complete except for the omission of a paper by Fisher 
(1921) and one by Elderton (X9386). For corrections in the case when the Sheppard 
conditions are violated, see Pairman and Pearson (1918), Sandon (1924), Martin (1934), 
Pearse (1928) and Elderton (1938a). For Sheppard’s corrections for a discrete variable 
(which appear to be due to H. C. Carver) see Craig (1936) ; and for the corrections in the 
multivariate case see Wold (19346). 

References to the problem of moments (i.e. the conditions under which a set of constants 
can form the moments of a distribution) are given at the end of Chapter 4. As to the 
mathematical basis of the principle of moments, see Merzrath (1933) and Romanovskj^ (1936). 

Craig, C. C. (1936), ‘‘Sheppard’s corrections for a discrete variable,” Ann. Math. Statist. 
7, 55. 

Dressel, P. L. (1940), ‘‘Statistical seminvaiiants and their estimates with particular 
emphasis on their relation to algebraic invariants,” Ann. Math. Statist., 11, 33. 
Elderton, Sir W. P. (1938a), Frequency Curves and Correlation, Cambridge University Press. 

- (19386), “ Correzioni dei moment! quando la curva h simmetrica,” Qiorn. delV 1st. 

Ital. Ait., 16, 145. 

Fisher, R. A. (1921), “On the mathematical foundations of theoretical statistics,” Phil. 
Trans., A, 222, 309. 

Frisch, R. (1926), “ Sur les semi-invariants et moments employes dans r6tude des distri¬ 
butions statistiques,” Oslo ; Skriftsr af det Norske Videnskaps-Akademie, II, 
Hist.’Filos. Klasse, No. 3. 

Kendall, M. G. (1938), “ The conditions under which Sheppard’s corrections are valid,” 
J. Roy. Statist. Soc>., 101, 592. 

- (1941), “ The derivation of multivariate sampling formulae from univariate formulae 

by symbolic operation,” Ann. Eugen. Land., 10, 392. 

Martin, E. S. (1934), “ On the correction for moment coefficients of frequency distributions 
when the start of the frequency is one of the characteristics to be determined,” 
Biometrika, 26, 12. 

Merzrath, E. (1933), “ Anpassung von Fliichen an zweidimensionale Kollektivgegenstande 
und ihre Auswertung fiir die Korrelationstheorie,” Metron, 11, No. 2, 103. 
Pairman, 1^., and Pearson, K. (1918), “ On corrections for the moment-coefficients of 
limited range frequency-distributions when there are finite or infinite ordinates 
and any slopes at the terminals of the range,” Biometrika, 12, 231. 



86 


MOMENTS AND CUMULANTS 


Pearse, O. E. (1928), “ On corrections for the moment-coefficients of frequency-distributions 
when there are infinite ordinates at one or both terminals of the range,” Biometrika, 
20A, 314. 

Romanovsky, V. (1936), ” Note on the method of moments,” Biometrika, 28, 188. 

Sandon, F. (1924), “ Note on the simplification of the calculation of the abruptness coefficients 
to correct crude moments,” Biometrika, 16, 193. 

Shohat, J. (1929), “ Inequalities for moments of freqiiency functions and for various 
statistical constants,” Biometrika, 21, 361. 

Thiele, T. N. (1889), Theory of Observations (reprinted in English in Ann. Math. Statist., 
1931, 2, 165). 

Wold, H (1934o). “ Sulla correzione di Sheppard,” Giorn. dell’ 1st. Ital. Att., 4, 304. 

-(19346), “ Sheppard’s correction formulae in several variables,” Skand. Aktuar., 

17. 248. 


EXERCISES 

3.1. Show that the rth moment about the origin of the distribution 
dF — kx~i’e~r^^ dx 0<a;<oo, y>0 


18 


/‘r 


y’T(p - r - 1) 
r(V-i) 


if f < p — 1, and does not exist in the contrary case. 

3.2. In the distribution 

( o«2\-‘TO 

1 + — j ^ dx — oo<:r<(X) 

show that, about the origin, 


cos 




and hence that 


p; ‘J*' 

2 

r 




3.3. Show that the discontinuous distribution whose frequencies corresponding to 
the values 0, 1, . , . j . are 


m \ 

V’ i!’ • • • j! • • •; 


has, for the moments about the moan, 

^3 = m, fjL^ = w, w(l + 3m), = m{l + 10m), /ie == m(l + 25m + 15m*). 

3.4. Show that for the distribution whose frequencies for variate-values 0, 1, . , . J,.,. 
are the successive terms in (J + J)", i.e. (|)" |^1, . . .J all cumulants of odd 

order except the first vanish. 



EXERCISES 


87 


3.5. Show generally that the cumulants of odd order vanish for any 83 nmnietrical 
distribution, except the first. 

3.6. Show that may be expanded in an infinite series, valid in — oo < a: < oo, 

^[2] 

the factorials being taken with unit interval; and hence that 

^Ir] ~ 

d 


where 


d = 


d(e^^y 


Hence show that, for the binomial (q + py about the origin, 


3.7. Show that the distribution whose frequency at the variate-value i 2r (r integral) is 

,2r-f i 

+ • . 


_I- ^ _L 


and at db (2r + 1) is 


\0!(2r)! l!(2r + 1)! 2!(2r + 2)! 

„2r+3 


-Saf , _« 

[(2r -f 1)! l!(2i 


y2r4-5 


!f2r + 2)! + 2!(2r + 3)! 
has odd-order cumulants equal to zero and even-order cumulants equal to 2a. 

3.8. Show that for the distribution 


■1 


!+...} 


1 o 

dF=^-e dx, 
a 


0 < X < 00 


3.9. Show that 


and hence that 


= a''{r — 1)! 


K, = (- 



1 


n\ . 

/^2 

(oh 


(oh 

l^r 

(r-\\ . 

0 h- 


0 

1 


0 

0 


0 

0 


(r - 2)/^^ 



88 


MOMENTS AND CUMULANTS 


3.10. Show that for the distribution 

dF — dx, 0 < z <, I 

grouped into an integral number of intervals of equal width h, the oorreotions to the second 
and foturth moments about the mean are 

+ /i. Y + 

(Cf. Elderton, 19386. Note that the first is exactly, and the second approximately, the 
Sheppard correction vHtk sign reversed.) 

3.11. If stands for the operator such that 

== '■‘"VV-p* ‘r>P 

~ 0 rep 

and is distributive when applied to products, e.g. 

d^{AB) = B{d„A) + A(d„B), 

show that dp annilulates every cumulant (considered as a function of the moments) except 
#Cp, and that 

(Cf. Kendall, 1941.) 

3.12. If/(^) is an odd function of x of period show that 

(* a:'’a?~*®*^^/(log x)dx == 0 
Jo 

for all integral values of r. Hence show that the distributions 

dF = — A sin (l^r log x)}cZx 0 < x < oo 

0 A 1 

have the same moments whatever the value of L (Stieltjes. See refs, to Chapter 4.) 

3.13. Show that if the frequencies of a discontinuous distribution are distributed 

at equal intervals —, m in each grouping interval A, the average grouping corrections to 
m 

the cumulants are given by 



(Cf. Craig, 1936.) 

3.14. Liapounoff’s inequality for moments. Beginning with the inequality 

(lab)* < (Ia*)(Zb*) 
show that for positive values ... 

{lx~) < {Ix^'XFx'^). 



EXEB€ISES 


89 


Hence that 

is true when jj is of form 2”*. Hence show that it is true for any integral p by noting that 
if 2’" is the smallest power of 2 greater than p we may take 

(Xj-f" • • • OCp 

*p+l — “p +2 = • . . « 2 »n = ~ • 

Hence, putting p == a — c, aj = . . . == a„_6 = c, n^b-i — • • • ®o-e = ®< show that 

P«-c < 

(The inequality remains true for a continuous variate, as maj' be seen by considering limiting 
processes.) 


3.15. Show that for the bivariate distribution 


dF = 


1 


2jt(ricr,(l - p*yt 2(1 — 

aU cumulants r, s > 2, vanish ; and further, if 

3 - 

^ra 

= (r + s - 1)pK-u ,-i + (r -1)(5 - l)(l - 

(2r)!(2s)! v- {2p)^> 

+* -4c (r-/)!(«-i)!(2i)! 


1 


rf axCTa a^J 


GO < a:, < cx) 


^2r, 2a ““ 




'^2r+l, 2s+l 


(2r + l)!(2^+ 1)! 


9r*f« 


I 

^ (r - j)! 


(2p)>‘>- 




(r - j)!(« - j )!(27 + 1)1 


^2r, 2«+l — ^2r+l. 2* — 


where < is the smaller of r and s. In particular, 

All p) A 31 Ap, Asi = l^pf All 105p, A 91 = 94*5p j 

A„ - (1 + 2p*), A*. = 3(1 + 4p*), A., = 15(1 + 6p*) 

A,« = 106(1 + 8p*), A,. ,0 = 946(1 + lOp*); 

A„ = 3p(3 + 2p*), A,5 = 15p(3 + 4p*), 

A.4 = 3(3 + 24p* + 8p*). 


% 



CHAPTER 4 

CHARACTERISTIC FUNCTIONS 


Moment- and Cumulant-Oenerating Functions * 

4.1. In the previous chapter we considered the characteristic function 

e^dF .... (4.1) 

as a moment-generating function. We have also 

y>{t) = log <^{t) = .(4.2] 


ip(t) being known as the Cumulative Function. It generates the cumulants in the same 
way that the characteristic function generates the moments. If the moment of order r 
exists, can be expanded in powers of t at least as far as the term in (it)', and so can y>(t). 

Other functions can be constructed which generate the moments. For example, 
since for 1 te 1 < 1 


we have the formal expansion 



•“ __dF__ 
_*.(! - tx) 


I 

r-O 


<7‘r. 


(4.3) 


Generally if a function f(<) can be expanded as a power series in t, Fa-t^ we have, subject 
to existence (and convergence when the series is infinite). 


Since 


we have 


f C(to) dF = EUjt’fij. 

J —OD 


c+‘)*=i;4v 


7«a0 


ft>(<) = p {l-^tfdF f]l/ 
J-«o ^^ 3 '. 


ifi 


(4.4) 


(4.6) 


and thus co{t) may be regarded as a factorial moment-generating function. We may also 
define a factorial cumulant-generating function 

log a>(<) = r .(4.6) 

though this function has not come into general use. 


4.2. The generation of moments is by no means the most important property of the 
characteristic function, and in this chapter we discuss some of the theorems which give 
it a fundamental place in statistical theory. 


90 



THE INVERSION THEOREM 


91 


We recall, in the first instance, that <j>{t) always exists, since 


r. 


e*'® dF < 


r |e«® 
J *.00 

f* dF = 
J —00 


dF 


(4.7) 


so that the defining integral converges absolutely. Further, is uniformly continuous 
in t and differentiable j times under the integral sign if the resulting expressions exist 
and are uniformly convergent, for which it is sufficient that exists. For then 


i^\t) j = 


xh^^dF\ 


\ dF ^ V. 


(4.8) 


The Inversion Theorem 

4.3. We now prove the fundamental theorem of the theory of characteristic functions, 
which will be called the Inversion Theorem, namely that the characteristic function uniquely 
determines the distribution function; more precisely, if ^{t) is given by (4.1) then 

1 foo 1 — ^ 

F(x) ^ F{0) ^ J ^J- ^dt ... . (4.9) 

the integral being understood as a principal value, i.e. as 


lim 


1 re 1 _ p-ixt 

^ —f— dt. 

2n].P’ tt 


Further, if F{x) is continuous everjrwhere and dF = f{x) dx 

/(a^) = ^ j <f>{t)e-^dt 


(4.10) 


the integral, as before, being a principal value if there is not separate convergence at the 
limits. Equation (4.10) may be compared with the form 


m = r 

J —00 


(4.11) 


the comparison exhibiting the kind of reciprocal relationship which exists between f(x) 
and 

As a preliminary we require an integral due to Dirichlet. It is easy to show that 


Putting 


-r 


sin X 


dx 


71 

' 9’ 


Jnn ^ 


we have 

J = Uq — Ui U2 . . . 4^ (— 4" • • • 

in which the terms decrease monotonically to zero in absolute value. Now let H{x) be 
a positive decreasing function. Consider 


i: 




^ Jim 


0 


Sin X 


dx. 


. (4.12) 





CHARACTERISTIC FUNCTIONS 


Writing H( + 0) as the limit of £r(e) as e —► 0 (c positive), we have, in virtue of the decreasing 
property of E, that any term in the series on the right in (4.12) is not greater than u^E{ + 0). 

Further, as the series alternates in sign, the difference between /_ and 1 H[-\ - dx 

J 0 \P/ ® 

is less than 0), which ten^ to zero uniformly in n. Consequently Ip is imiformly 

convergent and we have 




Similarly we have 


lim 


r h(^ 

J-« \pj X 


dx 




( 4 . 14 ) 


By a simple change of sign the results are seen to be true if H(x) is a negative increasing 
function. It is therefore true of any function which can be expressed as the sum or differ¬ 
ence of a positive decreasing function and a negative increasing function, and in particular 
of a frequency function or distribution function. 

Adding (4.13) and (4.14) and writing H{0) for J{H(+ 0) + H(— 0)} we have 


and putting px for x in this expression, 

lim [ H {x) 
p —00 J —00 


\sm X ^ 

1 - dx 

) X 

= 7r//(0) . 

• 

• 

. (4.16) 

X 

= nH{{)), 

• 

• 

. (4.16) 


the so-called Dirichlet integral. If H(x) is continuous at a: = 0 the value 

H^(+o)+^(-o)} 






is of course the usual value H{x = 0). 

Now consider 

— I dt^^ i 

Putting 
we have ‘ 

The product in curly brackets may be equated to the double integral 


(4.17) 


= f e^^^dF{x) 

J —«0 

J, = r dfjr 

J —c LJ —00 Jo J 


f r e"<*-«dF(a;)d| 

J 0 J —00 

which is evidently uniformly convergent. Making the transformation 


y ^ X 




THE INVERSION THEOREM 


93 


and intej^ting with respect to z, 

-= f” + X) - 

J —00 

This also is uniformly convergent in t and hence, integrating under the integral sign with 
respect to t, we have 

““ UK] 

“ r + ' • • • 

J-« y 

since cos cy is an even function. 

Now (4.18) is a Dirichlot integral and we have, therefore, 

lim J, = 2n{F{,X) - F(0)} .(4.19) 

C-^ QO 

Referring again to (4.17) we thus have, writing now z for X, 

X(r)-X(0)==A lim <l>{t)dtre-*‘(d? 

2n c—> oo J-r Jo 

and integrating with respect to 

F(x) ^ F{()) ^ lim 

which is the result stated in (4.9). It is to be remembered that in virtue of our convention 
in arriving at (4.16), F{^) at a saltus is \{F{x +) + F{x —)}. 


4.4. This expression may be thrown into an alternative form. From the definition 
of <}>(i) it is seen that and ^( — t) are conjugate quantities, and we thus have 

R{i) ^ \{m+<!>{-t)} 

m = -<!>(-1)}, 

B and I being the real and imaginary parts of Thus 

and by a change of sign in t. 


1 1 r 

2'2n}. 

If 

2n}. 




1 


niXi 


it 


dt 


r- 


t) 


<^( 
it 

R(t) sin xi + I{t){ 1 

^ __ 


- (1 — cos xf) + 


^^dt 


+ <^( “~ i) 
it 


i sin dt 


. (4,20) 


This integral is, of course, real. 


4.5. If now F{x) has a derivative f(x) we have 



94 


CHARACTERISTIC FUNCTIONS 


The integral being uniformly convergent in x, the diiferentiation can be carried out on 
the integrand, and we have 

itf 

the integral being a principal value. 

4.6. Consider now the expression, with a slightly dilFerent definition of Jo 

.(4.21) 

If the distribution function F has a derivative /, tliis is equal in the limit to 

lim = lira lf(x) = 0 
2c 2c'^' ^ 

and consequently tends to zero everywhere where F{x) is continuous and diJferentiable, 

i.e. if the frequency-distribution is continuous. 

If, however, the distribution is discontinuous, consider one point of discontinuity, 
say the frequency at Xj. The contribution of this part of the frequency to ^(0 will be 

/y and thus the contribution to will be 



If z Xj this clearly tends to zero ; but if a: = it becomes 

Thus the function tends to // at a; = Xj. 

Hence, if tends to zero at a point x, there is no discontinuity in the distribution 

die 

function at that point; but if it tends to a positive number /^, the distribution function 
is discontinuous at that point and the frequency is /y. This gives us a criterion whether 
a given characteristic function represents a continuous distribution or not. 


Example 4,1 

We found in Example 3.10 that the characteristic function of the normal distribution 

1 - 

dF = —e ^^*dx — 00 < X < CO 

aV 271 


is ^(0 -e- 

Suppose we are given such a function and require to find the distribution, if any, of which 
it is the characteristic function. 

In the first place we note that the distribution, if any, is continuous. For 


'Is 

2c 


iP 

2cJ_. 


<«0* 

•-j-C-ite dt. 



THE INVERSION THEOREM 


95 


re r* 

The integral is less in modulus than j e" ^ dt which is less than j 


e i dt ^ ~—. 
00 O’ 


Thus 5-5 —► 0, everywhere. We have then for the frequency function, if any, 

aC 


m 


1 r -ux 

— I e 2 c di 

T* 




This may be regarded as an integral in the complex plane along the line parallel to the 




real axis. Taking la + — as the new variable in place of t, we find that the integral is 


in fact - I c dC = ^—. 


Thus 


f(x) == —- - e 2 o>. 
G\^l7l 


This is everywhere positive and J dF converges. Hence it is in fact a frequency function 
with the given expression as a characteristic function. 

Example 4,2 

To find the frequency function, if any, for which 

We note that ^ tends to zero and that the distribution, if any, is continuous. We then 
have for f{x), if it exists, 




1 r* 

= o- e-'Ce"* + 
_ ‘ ("e- 


j*’ e'-^^dtj 


cos lx dt. 

This may be evaluated by two partial integrations. We find 

fix) ~ r — c"'^ cos /a:l — sin tx dt 

Jo 


-f ~ c ' sin tx 
n ttL 

1 


Jo ^Jo ^ 


cos ix dt 


n 


Thus 




x»f(x). 

1 

n(l + *»)’ 


— 00 < a: < 00 . 



96 


CHAEACTERISnO FONCTIONS 


As before, this funotion con represent a frequency function, and it is readily verified that 
f{x) has, in fact, the required characteristic function. 


Example 4.3 

Does there exist a frequency function for which 

m) = ? 

We have 

^ r dt. 

2c 2cJ_c 2cJ_e 

If 1 — a: is not zero the integral is 

J* [cos {(1 — x)t} + i sin {(1 — 

Since sin Ms an odd function this is equal to 


This does not converge, but it is bounded and hence 


If, however, x 


1, the integral is simply 

{C 

dt 


and thus 

2c 


1 . 


Thus there is unit frequency at a: = 1 and it is seen at once that this accounts for 
the whole of the frequency, so that there is no frequency elsewhere. The distribution 
thus consists of a unit at a; == 1, This is otherwise evident from the consideration that 
§Iog <f>{t) = it^ so that the second cumulant is zero and there is no dispersion. 


Emmple 4,4 

For what distribution, if any, are the cumulants given by /c,. = (r — 1)! ? 

The series 

(ity _ ^(Uy 

^ y- .) 

converges absolutely for | ^ [ <1 and is thus equal to y)(l) if such a function exists. We have 

y){f) =- = — log (1 - if) 

and thus == 

(1 — It) 



If the frequency function exists we have 


/(^) 


jir 

1 


-Let 


It 


dt 


This integral may be evaluated by integrating the complex function 
consisting of the real axis and the infinite semicircle below that aids. 


g-iiE* 

-—round a contour 

I — iz 

The first part reduces 



THE INVERSION THEOREM 97 

to the integral we are seeking. On the semicircle of radius R we have z = jR(cos 0 + i sin 6) 
and the integrand becomes 

exp ( ixR cos 0 + xR sin 6) 

1 — iR cos 0 + Bsin 0 

0 here lies between n and 27t and hence sin 0 is negative. Hence if x is positive the expression 
is less in modulus than 

R~ " 

i.e. tends to zero as R —*■ oo. 

Now the function --r- has a pole within the domain of integration at 2 — i and 

X ““ %z 

the residue there is Hence 


0 X 00. 


More generally, if = p(r — 1)^ p > 0, it will be found that the residue of 


IS ——5 80 that the distribution is 
J (P) 

/(X) - 


(1 — izy 


0 < X <, COy p > 0, 


Example 4.5 

For what distribution, if any, are all cumulants of odd order zero and those of even 
order a constant, say 2 a ? 

We have 

This series converges and 

^y{t) r= 2 a(co 8 < — 1 ) 

Hence 4>(t) = 

fOO 

If w'e try t^o integrate 1 g‘-ia(cos<-i) in the ordinary way we fail. Let us 

J —oo 

then look into the question of continuity of the distribution function. 


We have 


^ dt 




(2«)’ , 


COS H dU 


The series is uniformly convergent and hence 


J -2a i^loosHe-^^dt 


since sin xt is an odd function. 
A.S.—VOL. I. 


H 



98 


CHARACTERISTIC FUNCTIONS 


Consider now the integral I 2^ cos H cos od dt. By a well-known expansion 

J -*00 

2^ cos H cos xt ==s -f- c"*®') 


i{ef^ -f e~^ 


(*) 


e(/-2)« + . . . + 


The only part of this expression of present interest is the constant term, the others not 
contributing more than a finite amount to /<,. The coefficient of a® is zero unless x is integral 
in absolute value, and in that case is 


X ] + 




Thus tends to zero unless x is integral in absolute value, and in the latter case 
2c 


'' e 

2c‘ 




»</.•» \ 
^ i-7 


Thus, if a: is even, say 2r, the frequency at a: == ± 2r is 

\(2r)! ^ {2r -1- 2)!V 1 j ^ (2r -f- 4)!V 2 ) ^ ' j 


^ a —2a 


j2r 


>2r+2 


^2r^i 


/O/ 


+ 7^ 


[(2r)! ‘ (2r + 1)!1! ' (2r + 2)!2! 
and if a: is odd the freqxiency at a; == 2r + 1 is 




2r+l 


+ 


a 


2r4 3 


+ 


^2r+5 


“f 

+ . • . 

+ . . . 


[(2r + 1)! (2r + 2)!l! (2r + 3)!2! 

We may now verify that these frequencies account for the whole of 1:he characteristic 
function and hence that all frequencies have been found. 


Conditionfi for a Function to be a Cfiuracteristic Function 

4.7. Any function which is not negative in its range of definition and which is 
integrable in the Stieltjes sense can be a frequency function ; and any non-decreasing 
function which increases from 0 to 1 in its range of definition can be a distribution function. 
There are much more restrictive conditions to be obeyed before a given function can be 
a characteristic function. 

In the first place, let us note that it is a necessary and sufficient condition for a function 
j>{t) to be a characteristic function that 


F(x) = 




dt 


shall (except for an additive constant F(0)) be a distribution function. This, however, 
is not a very helpful criterion in practice. 



DISTRIBUTION AND CHARACTERISTIC FUNCTIONS 


99 


m 


Looking 
to be a 


to the definition of 



characteristic function are 


^iix gep that necessary conditions for 


(а) that must be continuous in f, 

(б) that ^{t) is defined in every finite t interval, 

(c) that ^(0) « 1, 

(d) that and t) shall bo conjugate quantities, 


(e) that I ^{t) I < j* | dF < 1 = 


These criteria enable us to reject certain functions as possible characteristic functions, 
but there do not appear to be any readily applicable sufficient conditions which enable 
us to determine at sight whether a given function can be a characteristic function. 


Limiting Properties of Distribution and Clmracteristic Functio'ns 

4.8. Suppose there is given a sequence of distribution functions FJpc) depending 
on a parameter n which can increase indefinitely. To each F^ there will correspond 
a characteristic function The question to be discussed is this : if F^^ tends to a limit 
F, will tend to a limit <jf> and is ^ the characteristic function f»f F ? Conversely, if 
tends to a limit <f>, does F^ tend to a limit F and is F a distribution function having (f> for 
its characteristic function ? The answers to these questions, as will be seen below, are 
affirmative under certain general conditions. 

It is to be noted what is meant by a distribution function tending to another. If 
both are continuous, F„(.t) is said to tend to F(x) if, given any e, there is an /^o such that 
I *^(^0 i < f for all n > n^. If there are discontinuities present, F^ will be said 

to tend to F if it does so in every point of continuity of F, Since by definition our functions 
are taken to be continuous on the left at saltuses, this evidently conforms to the definition 
for the continuous case and to the common-sense requirements of the situation. 

4.9. We require two preliminary theorems for later work. The first is that if 
tends to a continuous F it does so uniformly. 

For the range can be divided into a finite number of parts, say at fa • • • such 

that F(fy.^.i) — F(^j) < ~ for all j. Then as n increases there will come a time when 
s 

< - for all j. Thus there exists an no such that for n'> 

It is sufficient to show that this implies that for any x 

I F,fx) - F{x) I < e, n> Uo. 

In fact, if X lies between and 

< F{x) < F{i, „) < F(^j) + I 

and 

F(ii) - I < < F,(X) < +1 < F{i,) + « 

and thus — e < Ff^{x) — F{x) < e, 

which is the required result. 



100 


CHARACTERISTIC FUNCTIONS 


4.10. The second theorem we require (the Montel-Helly theorem) is that if the 
sequence F^ix) is monotonic and bounded for all x (which is so for distribution functions) 
then we can pick out a subsequence (x) which converges to some monotonio increasing 
function F (not necessarily a distribution function itself, for it may not vary from 0 to 1). 

Consider first of aU a series of values Xu a;*, . . . It is known that every bounded 
set of numbers contains a convergent sequence. Hence we can pick out from the sequence 
J^n(^i) ^ convergent sequence, say, F^^{xi). Then from the subsequence F^Jipc^) we can 
pick out a subsequence F^J^x^) and F^^{x) is thus convergent at both Xi and x^. C!ontinuing 
in this way we may, by picking out the first function in F^^ix), the second in Fn^(x), and 
so on, arrive at a sequence of functions Gi(x), O^ix) , . . which converges at each of the 
values Xj, x^, . . etc. This is the so-called Weierstrassian diagonal process. 

It follows that the sequence is convergent at every rational point x. Since 
OJfl) < OJ^x) < 0^{b) for every x between a and 6, we see that if (?,i(a) and 0^{b) converge, 
the limiting values of OJ^x) lie between those limits, say 0{a) and 0(b), 

Then the function u{x) == upper boxmd of O^J(x) (x not necessarily rational) is 
well defined and non-decreasing and so has no more than an enumerable number of points 
of discontinuity. If is continuous at x, we take y and z such that y <x <z and 
u{z) — u{y) < B. Then if a and 6 are rational points such that y ca <x <ib <z it 
follows that u(y) < 0(a) < 0(b) < u(z). Moreover, as all the limiting values of Oj^(x) 
are between 0(a) and 0(b), they are between u(y) and u(z). Hence, as e can be arbitrarily 
small, we see that 0(x) tends to u(x) at every point of continuity of u. Finally, by the 
diagonal process, we can select a sequence which will also be convergent at the points of 
discontinuity of u(x). The theorem is established. 


The First Limit Theorem 

4.11. We now prove the theorem : if a sequence of distribution functions F^ tends 
to a continuous distribution function F, then the corresponding sequence of characteristic 
functions tends to xmiformly in any finite f-interval, where <f> is the characteristic 
function of F, 

It is required to prove that, given e, there is an no independent of t such that 

I 0(0 - 0n(O I = I [ e^*^(dF - dFJ | < n > no. 

J ~ao 

Select two points of continuity of F, X and — X, We can make X as large as we 
please. We then split the integral 

eit^(dF - dFJ .(4.22) 

into two parts, that in the range — X to + X and that in the remaining portion of the 
range. Now 

*x>X I rT>X 

c’'* c?/’ < flF <1 - F{X) -F(- X), 
x<-X 1 Jx<-X 


£ 

and by taking X large enough we can make this quantity less than 


Similarly 


Cx>X 

e^^dF^ 

Jx<-X 


<l-FJ,X)-F^{-X), 



THE FIRST LIMIT THEOREM 101 

and since tends to F (and that uniformly) this, for some large X, will be less than 
Hence for some the portion of (4.22) outside the range — JSC to + X will be less in 

S E B 

modulus than g + 3 — 2‘ other part 

- dF„) .(4.23) 

This expression is the limit of the sum 

Ie<^i[{F(ii+i) - m)} - - W)]. • . • (4.24) 

being the boundaries of the interval into which the range is subdivided and x, 
a value in that interval. The diflFerence between this sum and the limiting value can be 
£ 

made less than - if the intervals are small enough ; for if they are loss than rj in width the 

difference of and e***/ is less in modulus than rj\t\, by the mean value theorem, and 
thus in any <-rango ± T the difference of (4.23) and (4.24) is less in modulus than 
r^T \ E[{F{^,^,) - J’lf,)} - {i?’„(f;,i) - J’„(^,)}] I < 2r,T, 

E E 

which is less than if ry < 

Now the sum (4.24) will itself be less than - for some n > Wq, for it is the sum of a finite 

4 

B 

number of terms each of which tends to zero. Consequently (4.23) is less than - and hence 

1 ^ (0 - 1 < c, n > w,. 


Converse of the First Limit Theorem 

4.12. The converse result is even more important: 

Let 4>n ^ sequence of characteristic functions corresponding to the sequence of dis¬ 

tribution functions Then if <^,^(0 tends to for all real t * uniformly in some finite 
^-interval, tends to a distribution function F and <!> is the characteristic function of F. 

As a preliminary lemma, let us prove that if F is a distribution function with char¬ 
acteristic function <f>, then for all real f and all > 0 


y 1 F{u) du — y\ “ 


ir / sin A 

^J~oo\ t J 


2 2il^ /‘>/\ 


dh 


. (4.25) 


In fact, put 


1 

OW - 1 


F(u) du. 


This is a continuous distribution function and its characteristic function is 


r 


e>‘>- dO 


- e*'* {F(x + A) - F(x )} dx, 

* Or equivalently, if ^t) is continuous at ^ = 0 or if is a characteristic function. 



102 


CHARACTERISTIC FUNCTIONS 


whioh by a partial integration becomes 

jr- JLp (^‘{dFix + A) - dF(x )} 

Jj—00 ^thj 

zi r dJE'ix) - c*** dF{x)} 

tih J 


- ir 

ith 
.1 

ith 

Substituting for G(x) in (4.9) we get 

j rf+ 8A 


1 f«+* 


*J„ - y, 


F(«) 




whence, writing f for f + ft, we find 

y'*‘JW * - d„ = (. 


1 _e-«A\* 


it 


-f 

4jrJ 


-A it ) 

ir /8in<\* -^J2t\,^ 

i]-J,-T)‘ 'iirr 


‘ n*) 


y 


dt 




the result announced in (4,25). 

Reverting now to the theorem required to be proved, note that it is suiBBcient to establish 
that if ^ uniformly in some interval | | < a, then tends to some distribution 

function F in every point of continuity of F. When this is established it follows from the 
First Limit Theorem that <ff is the characteristic function of F and that converges to 
if> uniformly in every finite ^-interval. 

As shown in 4.10, given a sequence F^ we may always choose from it a subsequence 
jP^/ such that F^' converges to a non-deereasing function F in every continuity point of F. 

Let us then choose such a sequence. We have of necessity 0 < F < I, and F may 
be supposed everywhere continuous on the left. It is then a distribution function if 
F(+ oo) — F(— oo) = 1, and this we proceed to prove.”** From (4.25) with f = 0 we have 

d,. 

By hypothesis tends uniformly to for | ^ j < a and hence (f}^^ does so, and it is easily 
seen that the integral on the right is uniformly convergent. Thus, given c, we can find 
fto such that for h> ho 


M/'-'-rim”)'** 


ft 


* It is not obvious that if tho functions Fn all vary from 0 to 1, then their limit must do so. In 
fact, if Fn(x) ^ 0, X < n, FJ^x) = J, x < n, Fn(x) =» 1, a: > n then lim Fti(x) J for 


— OQ < a; < oo. 



CONVERSE OE THE FIRST LIMIT THEOREM 


103 


where | ij [ < c. Now let h tend to infinity. As F is a non-decreasing function the left- 
hand side tends to F(+ oo) — F( — oo). The right-hand side tends, in virtue of the 
uniformity of and the consequent continuity of <f> near ^ = 0, to 



dtf 


which is equal to unity. 

Hence F, the limit of the subsequence is a distribution function whose characteristic 
function is <^, 

But any subsequence of tends to <f}y in virtue of the uniformity of the convergence, 
and hence any convergent subsequence of F,^ tends to F. Consequently F„ tends to F in 
every point of continuity of F and the theorem follows. 


Example 4S 

The binomial distribution {q -f pY' considered in Example 3.2 has the characteristic 
function 

{q + 

Now the frequency at x is This is greater than the ordinate at 

a; = y + 1 if 

or j > pn — q. 

For large n the maximum frequency will then be in the neighbourhood oij = pn, and 


is then 


(;) 




In virtue of Stirling’s approximation to the factorial this approximates to 

'Inn q^^p^^ 1 _ 

{pn c““ '^” \/( 2np7i ){qne' ( 2nqn ) V () 

and therefore tends to zero. 

Thus every frequency in the binomial tends to zero and the distribution does not tend 
to any limiting distribution. 

Suppose, however, that we express the distribution in standard measure. Putting 
f —tit v^e have 


m = r - r 

J —00 J —a 




dF{i) 


Hence 


/ 1 \ 

= c " • 


The effect on of transferring to standard measure is then to replace < t>y - and 


Ufj/ 


to multiply by e ^ . 



104 CHARACTERISTIC FUNCTIONS 

For the binomial = np, ~ npq, and thus the charaoteristio function of the 
binomial expressed in standard measure is 

/ itnp \/ , it 




+ n log J 1 + P 


nJ—^ - - 

\ 2npq 2 npq 


^ it __ p^ pet<‘ 
(npq)* 2npq ~ Q\npq)^ 


0 <161 <1 


+ 0(i»n-*) 


= - + o(e%-*). 

Thus for any finite t log if> tends uniformly to — and hence 

Thus the distribution {q + expressed in standard measure tends to the distribution 
whose characteristic function is e*"*^*, i.e. to the form 




Multivariate. CharacteriHic Functions 


CO X 00 . 


4.13. The characteristic function of a bivariate distribution F{Xiy Xi) is defined as 


= r r . . .(4.26) 

J —coj —00 

eneralJy, that of a multivariate distribution F(Xij a:*, . . . x^J as 

<f>{tu . <„) = r f“ . . . r dF{x„ x„. . . x„) (4.27) 

J —00 J —OO J —00 

r Xi, . . . x^ are independent we have 

i .<n) = f dFi{xr) f dF^{x^) . . . f e""'" dF„{x^) 

J J-00 J -CO 


and generally, that of a multivariate distribution F(Xij a:*, . . . as 


Similarly 


(f>{ti)(f}(t2) . . . <f>{tn) 


y)(t\i tf m m • t^) — log m • 


. (4.28) 


• (4.29) 


Thus the characteristic function of the joint distribution of a number of independent 
variables is the product of their characteristic functions ; and the cumulative function 
is the sum of their cumulative functions. This is a fundamentally important result in the 
theory of sampling. 


4.14. In generalisation of (4.9) we have 
F(Xu Xt, . . . xj - F{0, 0, ... 0) = ■ ■ ■ r 


1 I _ ^ixntn 

ur~ 


^(^i> ^8> • • • tn)^ti . . . dt„. , 


(4.30) 



MULTIVABIATE CHABACTERISTIC FUNCTIONS 


105 


The multiple integrals are to be interpreted as principal values 

lim f . 

The proof is similar to that for the univariate case. We have 

J = r ... . . . dx„ = 

Jo Jo a;. 


lim r ...f 

■—>•00 J —00 J—00 \p !P / ^1 


sm Xi Bin 


dxi . . . dx^ — 7 i^H( + 0 . . , + 0 ) 


31) 


lim f . . . f H{x^, .. . dx^... dx„ = ... 0 ). 

p-> OO J -~eo J-*O0 

Considering now 

= f • • •[ ^(^1» * • • ' • '(* • • • — (^‘ 

J -c J —c Jo Jo 

we find that 

r® C® 2 8incxi 2 sin ca-_ f„,~ , v , \ 

== ... -;—1-’'{F(A, +Xi--2fn + ^n) 

J — CO J — QO 

— F{ii, ... x„)}dxi. . . dx„. 

lim {2n)”{F(Xi, . . . xj — F(0, ... 0)} 

C —> 00 

and by considering the integration of (4.31) with respect to the f’s the result (4.30) follows. 


4.15. If we have a distribution F(x) and some function of the variate such as ^(x) 
we may consider the characteristic function of f 

= r e^iUF(x) .(4.32) 

J —00 

The distribution of f will then be given by (4,9) or (4.10), e.g. the distribution function of 
say G{i), is 

G(f) = I f exp (- dt .(4.33) 

~oo 


The Problem of Moments 

4.16. We can now consider in more detail a problem which suggested itself in Chapter 
3. Do the moments determine the distribution uniquely, and if not, under what conditions 
do they do so ? To give some point to this question let us note that in some circumstances 
it is possible for two different distributions to have the same set of moments. 

Consider in fact the integral 

r dt = p > 0, R{q) > 0. 

Jo ? 

(« + 1 ) 

q — OL + 

- = tan ht 
a 

X* = i. 


Put 


n a non-negative integer 
0 < A < i 



106 


CHARACTERISTIC EUKCTIONB 


We find on subetitation that 


and since 


f* 

J {cos -4- » sin } Mx 


f-T^) 


n +1 

(1 +»tan Are) i 


“ a'4^(l + i tan Are)"-4i 
COB {n + l);t -f i sin (a + l)re 
(cob Are)" 


.\H + 1 
A 


(4.34) 


= a real quantity, 

the imaginary part of (4.34) is zero. Thus the distributions 

/(x) = 4-c sin (/3x^)} .... (4.36) 

0<x<QO, a>0, 0<A<J, |e|<l 

have moments independent of e, and (4.35) defines a whole family of distributions having 
the same moments. 

Similarly, if we substitute * 

p — (^^;_+_i)^ q = OL i^, - — tan af =t, p — —(,s a positive integer) 

p a 2 5 4 1 

we find that the family 


/(x) = A'C"'®'*''’{1 + e cos (alx|'')} ..... (4.36) 


— oo<x<oo, a>0, 0<p = I ® I ^ 

all have the same moments, the range in this case being infimte in both directions. 


4.17. In full generality the problem of moments may be formulated as follows: 
Given a sequence of constants Co, Ci, . . . Cy . . 

(i) Does there exist a distribution function F such that 

fVdi?’ = c, ?.(4.37) 

Ja 

(ii) If 80 , is the distribution function unique ? 

(iii) What are the functions, if any ? 

We have not the space here to enter on a full discussion of these questions, which have 
stimulated some beautiful mathematics, particularly by Stieltjes (1918). Our treatment 
will be confined to the results of statistical interest, but we may indicate the principal 
results of Stieltjes. 

If we express the series 


as a continued fraction of the form 


ce 


7«0 




. (4.38) 


_1 _ 1 _ 1 __ I _ l__ 

dlZ Og 4- UjZ 4” ®4 4* ®2n—1® ^2n 4“ 


(4.39) 


then, if the limits in (4.37) are 0 to oo, it is a necessary and sufficient condition for the 



THE PROBLEM OF MOMENTS 


107 


existence of at least one F that all the o’s be positive ; and F is unique or not according 
as ^7% diverges or converges. 


I 


The case when the limits in (4.37) are ± oo has been treated by Hamburger (1920), 
who showed that an F exists if the expressioi^ of (4.38) as a continued fraction of the form 

^0 


OfQ z *4“ Ui “4“ ^ z 


. (4.40) 


gives positive values of the 6’s. In order that F may be unique it is necessary and sufficient 
that the continued fraction be completely convergent in a sense defined by Hamburger. 

We shall see presently that for finite limits in the integral of (4.37) the function F is 
always unique. 

4.18. Unfortunately the Stieltjes-Hamburger criteria are not of much practical use 
because, as a rxile, it is too difficult to express the a’s and b's of (4.39) and (4.40) explicitly 
enough in terms of the given c’s to enable questions of sign or convergence to be decided. 
We may, however, derive some criteria of statistical importance by considering the more 
restricted problem : given the moments of a distribution, can any other distribution also 
have the moments ? In other words, we are given the existence of one F and require 
to know whether F is unique. 

Note in the first instance that this problem need only be considered when absolute 
moments of all orders exist. It is evident that more than one distribution can exist having 
a limited number of moments finite and the remainder infinite. Furthermore, if any 
moment of even order exists, those of lower order must exist. In particular, if fi 2 r ^^xists 

/.CO j-O 

I dF and I x^^dF exist separately, and hence so do 1 \x^^''^\dF and 

Jo J-oo Jo 

ro 

I I I dF, and so also does I | | dF, the absolute moment of order 2r — 1. 

J —00 J — w 

Thus we consider only the case when all absolute moments exist. 


4.19. We will prove in the first place the theorem that a set of moments determines 

a distribution uniquely if the series converges for some real non-zero L 

The characteristic function is continuous in t and its derivatives exist if the moments 
exist. We have then in the neighbourhood of ^ = 0 


+ Rf, . 


. (4.41) 


3vJ^ 


where R^ is less in absolute value than (3.14), 


Thua if converges, tends to zero and hence is equal to the sum of the 
infinite series it exists. Moreover, this series is majorated by and hence 



108 


CHARACTERISTIC FUNCTIONS 


is absolutely convergent if the latter is convergent. Hence we have 

0) =» .(4.42) 

fTu 

and thus is uniquely determined in the neighbourhood oft ^ 0 , In the neighbourhood 
of ^ « #0 we have 


m ^ 

i-O 


p(t — taY 


j” 


+ K 


and the modulus of the coefficient 


of 


{t - /o)^ 

i! 


is not greater than 


Consequently ^{t) 


can be expanded ever^^where as a convergent Taylor series and is equal to the sum of that 
series. Hence may be extended from the neighbourhood < = hy analytic continuation 
through any finite f-interval. Hence tf>(t) is everywhere uniquely defined. 

But determines the distribution function and hence the latter is uniquely deter¬ 
mined. 


4.20* As a corollary of this theorem we have the result that a set of moments uniquely 
determines a distribution if 

_ 

lim^ V jg finite .. , (4.43) 


n 

V 

For the series is convergent if 

that is to say, in virtue of the Stirling approximation to the factorial, if 


\ J n 


< 1 . 


1 

If k is the upper limit of ~ this inequality will be satisfied for t < 7 

TV diC 

1 

It is also a sufficient condition for uniqueness that lira should be finite, a form 

of the criterion which enables us to disregard the absolute moments. In fact 

111 1 


BO that 


2n-~l ^ 2n t/ <1 v 2n-fl 

^2n~l /^2n ^’^2n+l 


1 -J..- ^ 

2n—1 


2 n 


1 1 2«, + 1 1 


I 

2n+l 


2n 1 2 n-i ^ 2n - l ^ 2 n - I 2 n + I 

Taking upper limits throughout we have 

— 1 _j_ — 1 A. _ 1 1 

lira ^ -- V 2 n--i < lim < lim -- 

2n — 1 2»-i 2n + 1 

— I L —_ 1 1 

and thus lim ~ and lim finite or infinite together. 




THE PROBLEM OF MOMENTS 


109 


4.21. As a further corollary we have the result that a set of moments uniquely deter¬ 
mines a distribution if the range is finite. For suppose the range is a to 6. Taking an 
ori]^ at a; a and letting 6 — a = c, we have 

(tn—[x”dF <c». 

J a 

X i 

Thus ft'n^ =»’»’* < ® 

• ■ n 

and hence lim = 0. 

n 


4.22. Two further criteria may be mentioned. The first is due to Carleman (1926). 
A set of moments determines a distribution uniquely if (in the case of limits — oo to -f- oo) 

Y ...... (4.44) 

j-o 

diverges. For the limits 0 to oo the corresponding series is 



. (4.46) 


Secondly, if there exists a frequency function, the moments determine it uniquely 
if, for limits — oo to -f oo 


f{x) < if 1 a; I for | a; | > Xo (if, /i, a, > 0 A < 1) , . (4.46) 

and for limits 0 to oo 

f{x) <M \ x\ for I I > ^0 a, > 0 ^ < J) . . (4.47) 

This result is due ultimately to Stieltjes. It follows without difficulty from the Carleman 
criterion. 

It is interesting to note that if for some x^ 

fix) > (a > 0, ^ Xu) .. . (4.48) 

then the problem of moments is necessarily indeterminate (as usual, A < J for the range 
0 to 00 and A < 1 for the range — oo to + oo). This follows from the examples in 
equations (4.35) and (4.36), for we can add to (4.48), without rendering any frequency 
negative^ a function all of whose moments are zero. 


Example 4,7 

The moments of the distribution 

1 

dF = — dxy — 00 < a; < oo 

0 -^ 2 ;^ 


/ijr-i 1 — U 

" 2^r! ' 


are given by 



110 


CHAKACTERISTIC FUNCTIONS 


1 ^ 

- 

n 


n\/2l n\ j 
a rfe-«"| 
n V 2 J \ e~ 


(2n)»V(4:m) U 


n\/(2e) w* 

2cr 

V(2i^ 

and thus the upper limit is zero and the distribution is unique, 
4.23, If the moment of order r exists it must be given by 


1 H 


Thus if can be expanded in an infinite Taylor series, that series must be 27 /ir- 

Tl 

Further, if this series does not converge, ^(i) cannot be expanded as an infinite Taylor 
series. But it can always be expanded in the finite form with remainder 




Thus, when the series does not converge, can be exj)anded in powers of t only 
asymptoticaUy. 

This illustrates the source of the ambiguity in the definition of when the infinite 

series f/j does not converge, for it is known that there exist an infinite number of 

functions which have a given set of coefficients in an asymptotic expansion. For instance, 
if a(t) has an asymptotic expansion in t the functions a(0 + kr ^ all have the same 

expansion. It is therefore hardly surprising that when 27 or 2" ^ fail to converge, 

there may be more than one frequency or distribution function with the same set of 
moments. 

But it does not follow from what has been said that there must be more than one 
frequency-distribution. There must be more than one function, but those functions may 
not qualify as frequency-distributions, e.g. they may be negative in part of the range. 
In the example just given, cannot be a characteristic function, for it does not obey 
the well-known condition that <f>{t) and t) should be conjugate. So far as I am aware, 

it is not known whether the condition that should converge is necessary as 

well as sufficient for uniqueness, 


The Second Limit Theorem 

4.24, We are now in a position to prove a further theorem on the limits of distri¬ 
bution functions. If a sequence of functions F^(x) has all moments existing and for all 
j f/jy then the are the moments of a distribution function F wliich is the 

limit of the sequence F^, provided that F is completely determined by its moments. 



THE SECOND LIMIT THEOREM 


111 


We will first prove the rather more general theorem : If there is given a sequence of 
distribution functions such that all moments of exist, and for any j the sequence 
lies between fixed limits independent of «, then a subsequence can be selected 
from F^ such that 

(1) lim 1 dF^. exists, = say. 

n—*>■<» J ^00 

(2) The subsequence F^- converges to some distribution function F. 

(3) I z'dF exists and is equal to 

The existence of /z) may be proved by the diagonal method exactly as for the Montel- 
Helly theorem of 4.10. By hypothesis, fjL\{n) is uniformly bounded and the rest of the 
proof follows that of 4.10. 

The existence of JP follows also from the Montel-Helly theorem. We apply the theorem 
to the subsequence derived by satisfying condition (1) and hence arrive at a subsequence 
obeying both (1) and (2). It must however be shown that JP is a distribution function, 
i.e. varies effectively from 0 to 1. This follows because 


^00 

r.. 


dFn < 


(jrti 


x’’ dF„ 




b> 1 




a < — 1 I 


(4.49) 


and hence, for the subsequence, with r == 0 and letting 7i' tend to infinity, 


0 < 1 - F{b) 



0 <' F(a) < 




80 that, as a, b tend to infinity the equations jP(oo) = 1, jP(-~ oo) = 0 are seen to hold. 

We also recjuire for later parts of the proof two results : the first that the convergence 
of 

lim [ (IF^ to f dF^ .... (4.50) 

is uniform with respect to n. This follows from the hypothesis that |M( 2 r+ 2 )(w) is bounded 
and from the equations (4.49). The second is that 

lim x»{l — = 0, lim | x* | = 0 . . (4.51) 

T ►oo X—>•—CO 

for fi > 0 and all integral w > 0. The first limit follows from 

and hence from 

6* {1 - F^{b )} < 6 > 0, 0<s<2j. 

We now have to complete the proof by showing that f z'dF exists and is equal 



J12 


CHAEACTERISTIC FUNCTIONS 


to fAp For this we use the theorem (an extension by Fr^ohet and Shohat (1931) of one 
due to Kelly) that if a sequence v^{x)t defined in the interval — oo to + oo, is such that 

(1) v^{x) is of bounded variation in any finite interval, 

(2) aU v^{x) and their total variations are bounded in any finite interval, 

(3) lim Vn{x) = v{x) exists for all x, except perhaps at a denumerable number 

n— 


of points, 

(4) 1 f(x)dv^{x) converges uniformly with respect to n to | f{x)dv„{ixi) if f{x) i 
Ja J —00 

eveiywhere continuous, 

^OO ^00 

then 1 f{x) dv{x) exists and » lim 1 f{x) dvj^x), 

J—00 n >-C»J —00 


This result may be applied to our sequence F^ix), which obeys conditions (1), (2) 
and (3). It also obeys (4) when f(x) = x^ in virtue of (4.50) and (4.51). Further F(x) 

is of bounded variation and hence I x^ dF(x) exists and equals, say, fi]* 

J —00 

Finally 


1 - fi'M) I = [ x>(dF - dF„) 

J —00 


1 f" 

1 r® 

1 r** 

x^dF 


+ 1 ^'dF 

J —(X) 

1 J —00 

1 Jb 


1 rb 

1 


I 

+ 

dFJ j a 

1 Jb 

1 Ja 

I 


• (4.52) 


By taking — a and b sufficiently large we can make the first four terms on the right as 
small as we please, for *— a < — a©, 6 > 6© ^^nd some n > 74 . Then by taking n sufficiently 
large we can make the fifth term as small as wo please (without affecting the smallness 
of the other terms). Hence | ju] — ju'j(7i) | may be made as small as we please. 

Tliis establishes the more general result. The theorem enunciated at the beginning 
of the section follows as a corollary. In fact, if tends to a limit then the 
subsequence F^> can always be selected and tends to a distribution function F with the 
moments /i). All we have to prove is that if the f/j are such that they uniquely determine 
F, the sequence F^ itself converges to F. 

Suppose that there exists a point of continuity Xo such that F^^(xo) does not converge 
to F(xo), Then a subsequence F,,/(a;) can be selected which converges to some other value 
at a:o. But from this we can select a subsequence converging say to F,(r), having 
the same moments as F{x), Since by hypothesis these moments uniquely determine 
F, Fy must be the same as F in all points of continuity, i.e* 

lim F,,..(xo) = F(xo). 

n"— 

This is impossible, for F^^^{Xo) is a subsequence of F^>{xo) which converges, but not to 
F(x^), 


4.25. The above proof can hardly be described as easy, though it depends only on 
simple notions such as continuity and convergence, but the Second Limit Theorem is so 
important that it has seemed worth while reproducing the proof in full. Many examples 
of its application will occur in the sequel. The chapter may be concluded with an 
illustration of its use in determining the limiting forms of distributions. 



THE SECOND LIMIT THEOREM 


113 


Example 4.8 

The discontinuous distribution whose frequency bA x — j (j = 0, 1, . . .) is c”’ 
has a characteristic function 


— exp — 1), 


and hence all cumulants equal to m. 


(ity 


The distribution is evidently the only one with such cumulants, for Ekj-- 

1 J' 


niE 


J 


(ity 


is convergent and equals m(e“ — 1), so that the cumulative function and the characteristic 
function are uniquely determined. 

fjfll 

Now as m tends to infinity the frequency at tends to zero and thus the 

distribution does not tend to a limit. This is consistent with the behaviour of the 
cumulants, which increase without limit. 

Suppose, however, we express the distribution in standard measure. Then 


7n 


1 


r-2* 


m 


Hence as m —> oo all cumulants higher than the second tend to zero, and hence the 
cumulants of the distribution tend to those of the normal distribution 


dF = —- ^ dXf — C 30 < u; < oo 

with the mean w*. 

Now we know that this distribution is completely determined by its moments 
(Example 4,7). We also know that the cumulants determine the moments and vice-versa, 
so that if the cumulants of the discontinuous distribution tend to those of the normal 
distribution, the moments will tend to the moments of that distribution. Hence the 
Second Limit Theorem is applicable, and the discontinuous distribution does in fact tend 
to the normal form whm expresml in standard rneastire. 


NOTES ANT) REFERENCES 


The idea of the characteristic function can be traced back as far as Laplace, but its 
introduction into the theory of statistics, through the theory of probability, is mainly due 
to Poincar(f" and Levy (1925), whose book provides the most readable and complete account 
of the function. More recent researches are outlined by Cramer (1937). The proof of the 
First Limit Theorem is substantially that given by Levy, The converse, given originally 
in a somewhat less general form by l^icvy, was proved simultaneously by him and Cramer, 
the proof in 4.12 following the latter's. 

The Second Limit Theorem seems to have been first proved by Markoff for the case 


when the limiting form is the normal distribution dF = ^^^dx. 


It was subsequently 


considered and extended by several waters, the general form of 4.24 being due to Frechet 
and Shohat (1931), whose proof has been closely followed. Some references to prior work 
are given by these authors. 

The problem of moments appears to have been first considered and solved by 

A.S.—VOL. I. I 



U4 


CHARACTERISTIC FUNCTIONS 


Tchebycheff. The memoir by Stieltjes (1918—^the memoir beiag first published m 1894) 
is classical. For some subsequent work see Hamburger (1920) and Carleman (1925), 

Carleman, T. (1925), Lea fmctions quaai-analyiiques, Gauthier-Villars, Paris, 

Cramer, H, (1937), Random Variables and Probability Distributions, Cambridge University 
Press. 

Fr4chet, M., and Shohat, J. (1931), “ A Proof of the Generalised Second-Limit Theorem,’’ 
Trans, Am, Math, Soc., 33, 533. 

Hambxirger, H. (1920, 1921), “ Uber eine Erweiterung des Stieltjosschen Momentproblems,” 
Math, AtifiaUn, 81, 236, and 82, 120 and 168. 

L6vy, P. (1925), Calcul des probabiliUs, Gauthier-Villars, Paris. 

Stieltjes, J. (1918), Eecherches sur lea fractions continues, (Euvres, Groningen, 


EXERCISES 

4.1, Show that if a frequency function/(:r) is symmetrical the characteristic function 
is an even function, i.e. (f){f) = i^(— t), and that therefore is real; and conversely, 
if <f>{t) is real the frequency function, if any, is symmetrical. 


4.2. Show that the function 

m 

is the characteristic function of a distribution function 


^ n a positive integer, 
a distribution function 




4.3. Show that the factorial moment-generating function o)(t) of the binomial {q + p)” 
is (1 + pt)^, and hence that 

/V] = p' 


4.4. If for a certain distribution 


= ba^, 

a and 6 being positive constants, show that the distribution is discontinuous with variate- 

values 0, a, . . ra, . . . and the frequency at ra equal to-^ 

r! 

4.5. Show that the function cannot be a characteristic function unless a « 2. 


4.6. Show that there is only one distribution with moments given by 

r(v -f r) 



EXERCISES 


115 


4.7. A theorem due to Weierstrass states that any function continuous in the range 

or 

{a, b) can be represented by a uniformly convergent series of polynomials y^PJx), P„(a:) 

n-«0 

being of degree n in x. Deduce that if two continuous frequency fvmctions, /x and /», 
have the same moments of all orders, 

r (A= 

J a 

and hence that the moments determine a distribution uniquely if it is continuous and of 
finite range. 

4.8. If0 is a non-negative function of the variate x and 

a(<) = r B*dF{x), 

J —oO 

show that the frequency function of 0, if any, is given by 

4.9. Show that if a characteristic function <^(/) possesses derivatives up to and 
including the second order, then 



and generalise this result. 

4.10. A theorem of Donjoy {Com^ptes rendus, 1921,173, 1399) states that if a function 
/(:r) defined in a range {a, h) possesses derivatives of all orders, if is the maximum of 

(^) I m the xange and if 27 —^ divergent, then f{x) is completely determined by 

its value and that of its derivatives at a single point. Use this result to show that a set 
of moments determines a distribution uniquely if 27-^ j diverges. 



CHAPTER 5 


STANDARD DISTRIBUTIONS—(1) 

5,1, There are certain distribution and frequency functions which, for both theoretical 
and practical reasons, occupy a central position in statistical theory. In this and the next 
chapter wo shall consider their properties, leaving their statistical uses to be developed 
and illustrated later in the book. We shall, however, indicate briefly some of the ways 
in which they arise, even at the expense of anticipating ideas introduced at a subsequent 
stage. This will not impair the logical continuity of our development and will give con¬ 
creteness to a treatment which might otherwise appear somewhat abstract. 


The Binomial Distribution 

5.2. Suppose we have a large population of members each of which exhibits either 
some quality P or a complementary quality Q (== not-P), for example, a population of 
men who are either blue-eyed or not-blue-eyed. Suppose that the proportion of individuals 
with quality P is p and that with quality Q is q, where of course p + ? = 1- If we take 
a random sample of N members from the population we expect that on the average pN 
members will exhibit P and Nq wiU exhibit Q. Wo may thus array the members according 
to the quality as 

N{p + q). 

Now suppose we choose N pairs of individuals. There will be pairs PP, pairs PQ, pairs QP 
and pairs QQ. Of the Np pairs for which the first member is P there will, on the average, 
be a proportion p for which the second member is P and q for which it is Q. Similarly 
for the Nq exhibiting Q in the first member. Thus the pairs may be arrayed as 

Np{p 4- g) + Nq(p q) ^ N(p + q)^. 

Generally if we choose N sets of n the array will be N(p 4 - y)”. That is to say, the 
proportion of cases containing j P’s and (m — j) Q's will be (^^p’q^'^, the term in p>q"~i 
in (p 4- ff)”- We are then led to consider the binomial distribution 

/ = (p4-g.)».(5.1) 

as a discontinuous frequency-distribution, the variate being the number of P s in the set 
of n, which may vary from n to 0. If, as is frequently more convenient, we wish to con¬ 
sider the variate as increasing from 0 to n, the distribution is inverted, i.e. becomes 

/ = (g 4- pY .(5.2) 

5.3. Distributions very clo.se to the binomial form occur in practice, particularly in 

artificial experiments with coin-tossing or dice-throwing. Some data by Weldon are 
shown in Table 6.1. Weldon threw 12 dice 26,.306 times and noted the values at each 
throw. This is equivalent to the drawing of samples of 12 from a large population. The 

occurrence of a 6 or a 6 on any die was regarded as the exhibition of the quality P, a 

“ success ” as we may call it. 


116 



117 


THE BINOMIAL DISTRIBUTION 


TABLE 5.1 

Freqvmcy-distribuHon of 26,306 Throws of 12 Dice, the Occurrence of a 5 or 6 being 

counted^ a Success. 


No. of 
Successes. 

Observed 

Freciuency. 

..... 

Theoretical Frequency 
from the Binomial 
26,306 

(0-6623 -f 0‘3377)i* 

No. of 
Successes. 

Observed 

Frequency. 

Theoretical Frequency 
from the Binomial 
26,306 

(0-6623 4- 0-3377)1* 

0 

186 

187 

6 

3,067 

3.043 

1 

,1.149 

1,146 

7 

1,331 

1,330 

2 

3,265 

3,215 

8 

403 

424 

3 

5,475 

5,465 

9 

105 

96 

4 

6,114 

6,269 

10 and over 

18 

16 

5 

5,194 

5,115 

Total 

20,306 

26,306 


If the dice were perfect (a condition rarely realised in practice) the proportion p of 
successes would be ^ ; and the appropriate binomial would be, in the form (5.2), (jj + J)i 2 . 
In this particular case the dice were not quite perfect, the proportion of cases exhibiting 
a »5 or a 6 being (>-3377. Taking this as the value of p, we get the frequency function 
(0*6623 + 0 * 3377)^2 which when multiplied by the total frequency 26,306 gives the 
theoretical frequencies shown in the third column of Table 5.1. The agreement with 
observation is evidently fairly good. 


5.4. Taking our variate to be increasing, we have, from (5.2), that the frequency at 
X is characteristic function of the distribution is then 

= {<1 .(5.3) 


We then have for the moment of order j about the origin, from (3.11), 



and hence 


4 ’ 

and so on. We find 


= np I 

= «;> + — i);)4 

• 

. (5.4) 

= rtpq . 


. (5.5) 

ju, = npqiq - p) 

. 

. (5.6) 

fit = 3n^p’‘q^ -i- pqn{l — 6pq) 

• 

. (5.7) 

V - iii - 1“ P - Ri 

(npq)i Pi •• • 

• 

. (5.8) 

= _ 3 _ 

• 

, (6.9) 




and hence 






118 


STANDARD DISTRIBUTIONS 


5.5. Further formulae are not often required, but when they are can be derived 
from some interesting recurrence relations connecting the moments of the binomial. 

Writing d = it yre have, for the characteristic function referred to the mean as origin, 

^(/) = + peY .(5-10) 

Differentiating with respect to 6 we find 


1)! 


— npe~’'^{q + pe*)” -j- + pe®)”"* pc* 


^2 


ft®' 


+ 




npe^ 
q + pe^ 


Za jl 


and hence, after a little re-arrangement. 


(? - npq{e» - = 0- 


Identifying coefficients in 6^ ^ we get 


f^r J + u • . .( 6 . 11 ) 


giving the moment of order r about the mean in terms of those of lower orders. 
Furthermore, writing the moment about the mean as 


Mr = 

we have, differentiating with respect to p, 

^ = - mi:(j-npy~^^'^q”-ipt — 2'(y — npy^^qn-i-^n 

The first term on the right is — The sum of the other two will bo found to bo 

- npY+^(^yf-^p‘ = ^Mr+v Hence we find 

Mr+i = .... (.5.12) 

For example, pi ^ 0, = vpq = np{l — p) and hence 

Pj = pq{n - 2np } 

= npq{q ~ p) 

as stated in (5.6). 

For fact/orial moments the expressions assume a particularly simple form. In fact, 
differentiating (q + p)^ r times partially with respect to p and multiplying by p^ we have 

n^'\q + 


— Mir] < 

5 +p == 1 

Mif] = w'V- 


and hence since 



THE BINOMIAL DISTRIBUTION 


119 


5.6. If jj = g the binomial distribution is obviously symmetrioal. If p ^ g the 
distribution is skew. But in both cases it will be unimodal unless pn is small. For the 
frequency of the (r + l)th term is greater than that of the rth so long as 




f 


or 

or 

which is equivalent to 


n! (n — r — l)!(r + 1)! ^ 
(w — r)!r! n\ q 

r + I p 

n — r q 


r -f - 1 

n + I 


<P^ 


Hence the frequency increases until the point when (r + 1) > p{n + 1) and then declines 
again. Some typical distributions are shown in Table 5.2. 


TABLE 5.2 

Terms of the Binomial Distribution lOfiOO {q + Values of p from 01 to 0-5, 

(Figures given to the nearest unit.) 


Nuinbor of 

p 01 

p = 0-2 

p - 03 

p sss 0‘4 

P 0-5 

8ucoesees. 

g = 09 

q «= 0-8 

q - 0-7 

q = 0*0 

q = 0-5 

0 

1216 

115 

8 



1 

2702 

576 

68 

5 


2 

2852 

1369 

278 

31 

2 

3 

1901 

2054 

716 

123 

11 

4 

898 

2182 

1304 

350 

46 

5 

319 

1746 

3789 

746 

148 

6 

89 

1091 

1916 

1244 

370 

7 

20 

545 

1643 

1659 

739 

8 

4 

222 

1144 

1797 

1201 

9 

1 

74 

654 

1597 

1602 

10 

— 

20 ; 

308 

1171 

1762 

11 

— 

5 1 

120 

710 

1602 

12 

— 1 

. 1 

39 

355 

1201 

13 

1 

— 

10 

146 

739 

14 

— 

— 

2 

49 

370 

15 

— 


— 

13 

148 

16 

— 

— 

.— 

3 

46 

17 

— 

— 


— 

11 

18 

19 

20 



— 

— 

2 


5.7. The ordinates of the binomial are most directly calculated from the formula 
qn-jpi j for low values of n the calculation is straightforward and for high values 
assistance can be derived from tables of log n\ The calculation of the distribution function, 




120 


STANDARD DISTRIBUTIONS 


* 


which is equivalent to the summation of terms of the binomial, is tedious to perform directly, 
but use may be made of the tables of the incomplete J5-function. We have, for Taylor’s 
series with the integral form of remainder 

f{a + A) = r %fm + r + th) dt. . . (6,13) 

Jo {r - l)l 

Putting a ^ q, h ^ p and /(a + A) == (g + pY' we have 


. (6.14) 


. (6.15) 


1 - r + l,r).(5.16) 

by a well-known property of the ^-function. 

The remainder after r — 1 terms is, similarly, I^{r — l,7i — r + 2), and hence the 
fth term is 

— /^(r, n —r + \) + T^(r — 1, ?? - r -r 2) 

=- l^(n — r + 1, r) - l^{n - r 2, r ~~ 1). . . . (5 17) 

Examph 5.1 

When 77 = 20, r =“ 11, = 0-4 we have for the remainder after 11 terms /o 4 (ll, 10) 

which from the tables is found to be 0* 127,621,2. The value given by summing the last six 
terms in the appropriate column of Table 5.2 is 0*1276, the eiTor in the last place being due to 
rounding up. The remainder after 12 terms is Jo 4 ( 12 , 9) — 0 056,526,4. The 11th term 
(10 successes '’) is then the difference of these tw^o remainders = 0*0710, as shown in 
Table 6.2 for the frequency per 10,000 of 11 successes. 


The Poissm Disfribviim 

5.8. Cases sometimes occur in which the proportion p of ** successes ” in the popula¬ 
tion is very small. We may suppose our number n large enough to render np itself appreci- 




where R, is the remainder after r terms and equals 

p _ r* JAI - 0'-^ w! 


’'-f 

Jo 


(r - 1)! (n- r)! 


(q + ptf-’- dt 


In (5.14) put < = 1 —We find 


Tx’-i (1-x) 
J 0 


_ B^(r, n —r_+ 1) 
B{r, n — r + 1) 

= Jp{r, n —r + 1) 


in the usual notation. This is also equal to 



THE POISSON DISTRIBUTION 


121 


able though f is small; and we are thus led to consider the limiting form of the binomial 
(6.2) as p —*■ 0 subject to the condition that np remains finite, and equal to A, say. 
Under these conditions the term 



_»!__ AVj _ AX"-*- 

(n — r).VI n\ n) 

A*” 





Thus the terms of the binomial become the successive terms 








(5.18) 


This is called the Poisson distribution, having been given first by Poisson in 1837. It 
has since been discovered independently by several other writers. 

From the point of view of characteristic functions we have 

<f)(t) = lim {q + 


= lim ^ 1 + 1) 


r 


= exp A(f*‘ — 1) 

which is readily verified to result in the distribution (.'5.18), 

Thus 

yit) = A(e« - 1) = A^-^-t- 

r- 

and hence all cumulants of the Poisson distribution are equal to X, We thus find 

//'. = A 
//j = A 
Ma A 

= A + 3A*J 

If we let n—> oo in (.5.11) and (5.12) we find 

r-2 

^ yi • )IH 


. (5.19) 


=# 7 ‘> 


and 


Pr+I = rA//,_i + 


Ad//, 

dX 


. (5.20) 

. (5.21) 
. ( 6 . 22 ) 


5.9. Tables of the function c for various values of A and r are given in Tables 

for Statisticians and Biometricians, Part I. The frequency polygons are very skew, almost 
J-shaped for low values of A, but become nearer to unimodal symmetry as A increases. 

^r-f 1 

A comparison of the successive terms and - — shows that the frequency increases 
up to the point for which r + 1 < A and then decreases again. 


i) 



122 


STANDARD DISTRIBUTIONS 


The summation of r terms in the Poisson distribution may be carried out in a manner 
simiiar to that of 6.7. The remainder after r terms of the distribution is found to be, 
from (5.13) 

jR, = JL f'c-^i-«(l 

^ /Xr)Jo ^ 


<)'•-> it 


rx(r) 

m 

A 




O' 


(5.23) 


in the notation of Pearson’s tables of the Incomplete JT-function. The argument used 
in these tables is a difficult one to work with in the present case and, though formula (5.23) 
may be used for summing a number of terms in the Poisson distribution, it is easier to 

calculate directly rather than to use an analogous expression to (5.17) in the form 


rth term 


;v(r ^ iy 


Hr 


1 


5.10. We now consider a generalisation of the binomial and the Poisson distributions. 
In 5.2 our approach was based on the drawing of sets of n from the same population. 
Suppose, however, we draw them from n different populations with proportions 

(Pi^ Qi) {p2^ ?*) . • • (Pn> ?n)- 

Then our proportional frequencies will be arrayed by the form 

{Pi + ?i)(Pa + ?2) . . . (Pn + ffn) = 11 (Pj + 7 /) • • • ( 5 * 24 ) 

which of course reduces to the binomial if all the p’s are equal. 

The characteristic function of this distribution is 


m) = /7(y, + 

from which we have 

ip{t) == 2 ; log (q^ + 

= 2’log |l + pfit + + . . .| 

= {it)Ep, -f ^ls(p, - p,*) +, etc., 
giving tiy = 1 

K, = fit = Epjqy] . 

Writing now f for the mean of the jo’s in the different populations, we have 

fly = nf 

= Epq =Ep - Ip* 


(5.25) 


= ri)-i(i:p)*-|rp»-^(rp)j 

= np — np^ — n varp 

(where varp is written for the variance of p) 

«= nfq — n var p . 


(6.26) 



THE POISSON DISTRIBUTION 


123 


A comparison of these results with those for the binomial shows that the variance 
of the distribution (5.24) is less than that of the binomial with the same average p by an 
amount equal to n times the variance of p. 

Similarly we see that, for the Poisson distribution in such a case 


and 


the mean of the A’s .... (5.27) 
ywi = ^ ~ var (np) 

*= A to order 


The Poisson form thus holds for (5.24) notwithstanding the inequality of the p% provided 
that the variance of A is small compared with n, which will be so if all the p’s are small. 


5.11. Consider now the case when successive sets of n are drawn from different popula¬ 
tions characterised by pi, Pa . . . P/^. In the previous case we supposed any set of n obtained 
by taking one from each of n populations. 

We now suppose that any set is drawn from one population only, but that different 
sets come from different populations. Our array of frequencies will now be 

.(5.28) 


and evidently the moments of this array are the sums of the moments of the (q + p)’^, 
that is to say, from (5.4), 

Pi = l<^np 

Writing p for the mean of the p’s as before, we have 

p; = np 

= 7ip + jZn{n — l)p® — 


= npq + j^Z7i{n 

= npq + n(n 
s= npq + n{n 


l)p* — n{7i — l)p 

l)varp. 


(5.29) 


In this case the variance is greater than what it would be if the distribution were of the 
ordinary binomial type by an amount n{n — 1 ) varp. 

For the Poisson distribution we have, on taking limits, 

5 , .(6.30) 

= A + var AJ ' ' 

and here also the variance of the distribution is affected. 


5.12. The results of the two preceding sections enable us to discuss the occurrence 
of the binomial and the Poisson distributions in practice. An example has already been 



STANDAED DISTRIBUTIONS 


m 

given in Table 5.1 of a distribution conforming to the simple binomial type. It is not 
easy to find material compiled outside the laboratory which does so. 

For example^ suppose we regard the possession of blue eyes as a success, and take 
a number of samples of n from the population of the United Kingdom in different localities. 
We should probably find that the proportions in these samples did not conform to the 
simple binomia] form. The variance npq calculated from the known n and observed p would 
probably turn out to be too small. If so we should conclude from (5.26) that the proportion 
p varied from place to place in the population, the deficiency in the variance of the propor¬ 
tions observed being due to the variance of p itself in the sections of the population from 
which the samples were chosen. We are assuming for the time being that these differences 
are not explicable on the basis of sampling fluctuation alone; but a full discussion will 
have to wait until later chapters. 

5.13. The same effect is found in distributions which at first sight might bo expected 
to be of the Poisson type. For example, suicide is a rare event and it might be supposed 
that if we took a series of large samples, say the population of the United Kingdom in 
successive years, the frequencies of suicides would follow the Poisson distribution. This, 
however, is not necessarily so, for all members of the population are not equally exposed 
to risk and the temptation to suicide may vary from year to year, e.g. being greater in 
years of trade depression. This inequality of risk is typical of one field in which the Poisson 
distribution has been freely applied, namely, industrial accidents. Table 5.3 shows, in 
the second column, the frequency of accidents occurring to women working on the manu¬ 
facture of shells. The Poisson frequencies shown in the third column provide a very 
poor fit. The reason is that the liability of individuals to accident varies. 


TABLE 5.3 


Accidents to 647 women working on H.E. shells in 5 weeks. 
(Greenwood and Yule (1920), J. Roy. Statist. Soc., 83, 255.) 


Number of 

Observed 

Poisson Distribution Distribution given 

Accidents. 

Frequency. 

with same Mean. 

by (5.33). 

0 

447 

406 

442 

1 

132 

189 

140 

2 

42 

45 

45 

3 

21 

7 

14 

4 

3 

1 

5 

5 

2 

f 


Totals 

647 

648 

648 


As a working hypothesis (cf. Greenwood and Yule, 1920) suppose that the population 
is composed of individuals with different degrees of accident proneness, represented by 
different values of A in a Poisson distribution ; and suppose that in the population the 
distribution of A is given by 

dF = 0 < A < 00 . 

r{p) 

There are theoretical reasons justifying this supposition. 


. (6.31) 



THE POISSON DISTRIBUTION 


126 


The frequendy of j successes is then 




IP) 
c» f® 




or the coefSoient of in 

fV f’ 

Ap)J« 

which, on the substitution of (c + 1 — <)A = u, becomes 

cP _/_JL-Wl — ^ 

(c -f- 1 <)** yc 4" 1 / \ c 4* 1 / 

The frequency of 0 , 1 , 2 , . , . successes is therefore 


/ Vfi P Mp 4il) \ 

Vc 4- 1/ r c 4- 1' 2 !(c + 1 > • ' 7 


The mean is thus 


u’r=.{ 4-P^± 4. I 

y + ij \c + 1 (c + 1)* + * ‘ 7 

\C + 1/ \ C / 


c 


Similarly 


/^2 


2 = fso that 


_p(c + 1 ) p « 

/“»-r2 ~ - — 7 + .i 


(6.32) 


(6.33) 


(6.34) 


(5.35) 


If we now put the observed mean and variance of Table 5.3 equal to the values of (5.34) 
and (5,35) we have two e(iuaiions which can bo solved for p and c. The distribution (5.33) 
can then be found. The frequencies are given in the fourth column of Table 6.3 and evidently 
give a much better agi-eement with the facts. 


5.14. The interesting feature of the distribution (5.32) is that it is a binomial with 
negative index. In the approach adopted in 5.2 the index is necessarily positive ; but it 
is often found that observational materials are represented by negatively indexed binomials. 
Yule (1910) • has given an illustration of this effect which does not depend on any arbitrary 
assumption about distributions such as that embodied in (5.31). Suppose, in fact, that 
wo have a population subjected to recurring attacks of a disease, that r attacks are fatal 
and that on the average one attack is fatal to a proportion p of individuals at risk, the 
actual numbers succumbing varying as if the population wore chosen at random from a 
larger population in which the propoi-tion of survivors is p. Consider the proportion of 
individuals surviving 0 , 1 , . . . attacks at the wth exposure. Evidently this is the propor¬ 
tion of successes in samples of a when the chance of success is p, i.e. {q 4 - p)^. The proportion 
of survivors at the end of n exposures will be the sum of the first r terms in this series. 


* J. Roy. Statist. Soc., 73, 26. 




126 


STANDARD DISTRIBUTIONS 


The projwrtion of survivors at the end of (» — 1) exposures will be the sum of the first 
f terms in (q (Consequently the proportion dying during the nth exposure is 

the difference, 

“ 7 G -” G 7 

=(“ I ly'v- 

Thus, since death does not commence till the rth exposure, for the values of n from 
r onwards we have the proportion of deaths 


P^h rq, 


(5.36) 


i.e. successive terms in ^*"(1 —• a binomial with negative index. A law of this kind 

has been found to operate in experiments on the killing of bacteria by disinfectants. 


The Hypergeometric Distribution 


5.15. Consider now the generalisation of the approach of 5.2 when samples of n are 
drawn from a population of N individuals, where N is not necessarily large. If we take 
a sample which contains r P’s and n ^ r Q's, it can arise in 


n\Np(Np — 1) . . . {Np — r + l)Nq(Nq — 1) . . . (Nq — n f r + 1) 


C) 


N{N -1) ... {N ~n + 1) 


O' 


Ni«i 


(5.37) 


ways. For there are ways of selecting the sample, and the r P’s can be chosen in 
n — r Q’b in ways, the expression given in (5.37) being 


Hence we are led to consider the discontinuous distribution 


^ . (5.38) 

a form in which the analogy with the binomial (6,1) is evident. As N —*■ co the form 
(5.38) approaches the binomial. 



THE HYPEEGEOMBTEIC DISTEIBUTION 


127 


The series 




(6.39) 


is equal to 

(%)W ^ xf 

NM ^(Nq - n +j)in j\ 
that is to say, to the hypergeometric function 

i’(a,/9;y,x) 

if « = —»,/?= — Np, y — Nq — » + 1. 

The distribution (6.38) is therefoi’e called hypergeometric. We have 


(5.40) 


y,x) 


1 , a/? a: , a(a + !)/?(/? + 1) a:* 

‘+7n + —i^TTT)—2!+ • 


and it is well known that this function satisfies the differential equation 
rr(l — {y — (oL + *“ ^ 


(5.41) 


a fact which may be readily verified from the equation itself. 

If in (5.39) wo put x = (0 = it) we evidently have the chB-racteristic function of 

the distribution. On making this substitution in (5.41) we find, after some reduction and 
replacement of the values of a, /3, y by those of (5.40), 

(1 - = 0 . . (6.42) 


ti 0^ 

Since (f> == we find, from the coefficient of 0® in this expression, 

— Nnp + N/u\ = 0 

Tip »••••» (5.43) 

the same result as for the binomial. The mean of the hypergeometric series is independent 
of N. 

Taking now the distribution about its mean, and hence substituting for <f> in 

(5.42), we find 

il-eO)^^,+'^^{n(p-^q)-Np) + {N-n)pqn^‘j+N^^O . (5.44) 

whence, identifying coefficients in 0, 0^, 0® we find 


__ npq{N — n) 
P* N - 1 


(5.45) 


npq(q - p){N - n){N - 2n) 

(N - 1){N '~2f .^ ^ ^ 

’ + +6n(iVr_«)}] (5.47) 


and generally, if E denotes the operation of raising the order of a moment by unity, i.«. 
Ep^ — Pr+i> have 

N/^r+i == {(1 + ft - E’-}[pt-{Np + n(q - p)}/f, + {npq{N - n)p^}] . (5.48) 

As we expect, when N --*■ cc these values tend to those of the binomial. 




128 


STANDARD DISTRIBUTIONS 


5.16. An example of the occurrence of the hypergeometrio series in practice is given 
in Table 6.4, giving the frequency of occurrence of cards of a certain suit in hands of whist. 
Here N is the number of cards in the pack, 52, and w =» 13, p = J. The appropriate 
series is thus 

giving the frequencies shown in the third column. This agreement appears to be reasonably 
good. 


TABLE 5.4 

Distribution of 3400 First Hands at Whist according to Number of Trumps in the Hand. 
(K. Pearson, 1924, Biometrika, 16, 172.) 


Number of 
Cards in 
the Hand. 

Observed 

Frequency. 

Frequency of 
Hyporgeometric 
Distribution. 

Number of 
Cards in 
the Hand. 

Observed 

Frequency. 

Frequency of 
Hypergeometrio 
Distribution. 

0 

35 

43-5 

5 

444 

424*0 

1 

290 

2722 

6 

il5 

141*3 

2 

696 

700*0 

7 

21 

30*0 

3 

937 

973*5 

8 

11 

4*0 

4 

i 851 

8U*3 

9 and over 

! 0 

0*2 




Totals 

3400 

3400 


5.17. The calculation of frequencies and the summing of series of frequencies is not 
so simple a matter as for the binomial, but the incomplete B-fimction may be used to 
give a fairly good approximation. The method consists of fitting a B-curve of type 


dF = (1 — dx 

B{v,q) 


0 < x < 1 


to the distribution and obtaining areas of that curve from the J5-tables. Details of the 
method and an example are given in the preface to the Tables of the Incomplete fi-function. 


The Normal Distribution 

5.18. We have already noted in Examples 4.G and 4.8 that the binomial distribution 
and the Poisson distribution both tend, when expressed in standard measure, to the form 

1 

dF = — - e 2 dx — 00 < a: < 00 . . . (5.49) 

V 271 

The slightly more general form 

/jjfi _ I /^i)^ j ^ ^ 

dx — CO <x CO . . (5.50) 

V 2710 ' ' 

is known as the normal distribution. It is the most important theoretical distribution in 
statistics. The expression (5.49) is of course the normal distribution in standard measure. 




THE NORMAL DISTRIBUTION 


129 


We have for the oharaoteristic function of (6.50) 

giving fiir 


2rr\ 


and 

80 that 


W) - 




+ itfi' I 




. (5.61) 
. (5.62) 

. (6.63) 

. (5.64) 


Ki — (It = C' 

Kf =0, r > 2J 

We also have jff* = 3, y» = 0, which accounts for the standard adopted for mesokurtosis 

in 3.32. 


5.19. The distribution function of the normal distribution expressed in standard 
measure is the integral 

= —rr.r-T f dx. . .... (5.66) 

'y {In) J—CO 

The integrand may be expanded and we have 

- 5 + 


dx 


ill/ 

2+7(2^ r 


2.3 2! 22.6 


etc.^ 


(5.56) 


This converges too slowly to be of use for other than small values of z. 
If X is large an asymptotic series may be employed. We have 


1 ar* 


dz 


1 *ixdz 

V(2;7t)Jx a; 


V( 27 t) 

and on repeating the partial integration, 


r- a-*—loo .T* 

L_ _ ?ll _ i r^l! 

(2;r)L X Jj. •v/(27i) Jj- x* 


dx 


1 |i _ I + 3_ _ ^ 1 


(6.57) 


*-v/(23t) 

where i2„ is less in absolute value than the last term taken into account. 

The most useful formula, however, is a continued fraction due to Laplace. Put 

1.3 1 


a(<) 




X*(l - <)2 x*(l - t) 


A.S.—VOL. I. 





ISO 


STANDARD DISTRIBUTIONS 


BO that a(0) is the expression in curly brackets in (5.67). We then have 

1 a* _ 1 3 

*» di x‘(l - <)* »*(! - ’ * ' 

= - {(1-<)a(0-l} 

Hence if a(<) = we have, identif 5 ring coefficients in f, 

(>• + l)l/r+l +l/r- Vr-l = 0 . 


Hence 


Thus 


Vr 


1 


Vr-l 1 4_ »■ + 1 Vr+l 




Vi _ X ^ 23ys 

^Vi 

X § d 


X + X + X + 


n 

X "f” 


( 6 . 68 ) 


Now when f = 0, a reduces to yo, and we also see from (5.58) that as a: —► oo, j/j = yo- Hence 
we have, from (5.57), 


F{x) = 


e "2 ( X 


\/{27i)x + a; + cr + 


\/(27z) * (x X X "jr 


. (5.59) 


The continued fraction thus gives the ratio of the frequency of the normal distribution 
to the right of the point x (the tail area to the ordinate of the frequency-distribution 
at that point. 

This expression was in fact used by Sheppard (1989, posthumous) in calculating his 
superb tables of the normal function. These tables give, among other things :— 

(а) The ratio of the tail area to the bounding ordinate, to 12 places of decimals, for 

intervals 0 01. 

(б) The same to 24 places for intervals Od. 

(c) The negative naturaliogarithm of the tail area, to sixteen places, by intervals 01. 

Tables which are sufficient for all ordinary purposes will be foundin Tables for Statisticians 
and BiometricianSf Parts I and II. At the end of this volume we give some tables which 
will suffice to illustrate the theory and examples given heroin. 


5.20. The shape of the normal curve 


y = 


1 

-e 2 

V(2^) 


is illustrated in Fig. 5.1. It is symmetrical and unlimited in range, falling off to zero 
very rapidly as the variate increases. There are pcints of inflection at unit distance on 
either side of the mean. 



THE NORMAL DISTRIBUTION 


131 


For the mean deviation we have 



■ 2 dx — 



*! 

xe~ 2 dx 
. 0-79788 


. (5.60) 


The variance is of course unity, because the distribution is expressed in standard measure 
The quartiles are distant 0-674,489,75 from the mean, as may be found from the Tables 



Fia. 5.1.—Tlie Normal Curve y ~ 


_?_e-i**, 

V(2rr) 




5.21. As an illustration of tho ocourrenoo in practice of a distribution which is very 
close to the normal, the height data of Table 1.7 may bo taken. Table 5.5 shows the 
actual frequencies and those given by the normal curve with tho same mean and standard 
deviation (67*46 and 2*56 inches respectively). 

The correspondence is evidently fairly good. It must, however, bo noted that whereas 
the theoretical distribution has infinite range, the practical distribution has not, since it 
is impossible to have a negative height. In this particular case tho relative frequency 
of the normal distribution outside the range 57-77 inches is so small that the point is 
unimportant; but when distributions of finite range are represented by those of infinite 
range it is as well to remember that the fit near the tails may not be very close. 

5.22. The normal distribution has had a curious history. It was first discovered 
by De Moivre in 1753 as th6 limiting form of the binomial, but was apparently forgotten 
and rediscovered later in the eighteenth century by workers engaged in investigating the 
theory of probability and the theory of errors. The discovery that errors of observation 




132 


STANDARD DISTRIBUTIONS 
TABLE 5.5 


Frequency-^Distribution of 8585 Men according to Height (Table L7) compared with Theoretical 
Frequencies of a Normal Distribution with the Same Mean and Variance. 


Height 

(inches). 

Observed 

Frequency. 

Tlieoretical 

Frequency. 

Height 

(inches). 

Observed 

Frequency. 

Theoretical 

Frequency. 

57- 

2 

1 

68- 

1230 

1234 

58- 

4 

3 

60- 

1063 

989 

59- 

14 

11 

70- 

646 

682 

60- 

41 

33 

71- 

392 

405 

61- 

83 

88 

72- 

202 

207 

62- 

169 

200 

73- 

79 

91 

63- 

394 

396 

74- 

32 

34 

64- 

669 

669 

75- 

16 

11 

65- 

990 

976 

76- 

5 

3 

66- 

1223 

1227 

77- 

2 

1 

67- 

1329 

1326 






1 

1 

Totals 

8585 

8386 


ought, on certain plausible hypotheses, to be distributed normally led to a general belief 
that they were so distributed. The belief extended itself to distributions such as those 
of height, in which the variate-value of an individual may be regarded as the cumulation 
of a large number of small effects. Vestiges of this dogma are still found in textbooks. 

It was found in the latter half of the nineteenth century that the frequency-distributions 
occurring in practice are rarely of the normal type and it seemed that the normal distri¬ 
bution was due to be discarded as a representation of natural phenomena. But as the 
importance of the distribution declined in the observational sphere it grow in the theoretical, 
particularly in the theory of sampling. It is in fact found that many of the distributions 
arising in that theory are either normal or sufficiently close to normality to permit satis¬ 
factory approximations by the use of the normal distribution. Furthermore, by a fortunate 
accident (if one may speak of accidents in mathematics) it happens that the analytic form 
of the normal distribution is particularly well adapted to the requirements of sampling 
theory. For these and other reasons which will be amply illustrated in the sequel, the 
normal distribution is pre-eminent among the distributions of statistical theory. 


5.23* Since the normal distribution may be considered as the limit of the binomial 
it is natural to inquire into the limiting forms, if any, of the hypergeometric distribution. 
From (5.38) we see that the difference between two successive terms in the distribution is 

1 — r Nq — n -j- r ‘j- 


fi(n — r - 
1 n! 


1 )! 


(Npy'^^(Nqy^'’ 


I 




r + 1 

Nq -f- n 


I (r+'l){Nq 


n — r 
- 1 -- r{N + 2) 
n -f /• -f 1) 




}■ 


The ratio of this difference to the (r -f l)th term is then 

= A + B r _ 

y, (7 4- ih- 




THE BIVARIATE BINOMIAL DISTRIBUTION 


133 


where the quantities A E are constants. In the limit when the distribution is 

expressed in standard measure, Ay^ is the increment when r increases by a small quantity, 
and we are thus led to consider the differential equation defining a frequency function 


dj 

f 


A-V Bx ^ 

C -^Dx + Ex* ■ 


. (5.61) 


This is the equation of a family of fimctions—^the Pearson distributions—which will be 
considered from a slightly different standpoint in the next chapter. 


The Bivariate Binomial Distribution 

5.24. In generalisation of the results of 5.2, consider the drawing of samples of n 
from a population the individuals of which may or may not have two attributes, P and 
not-P (== Q) and li and not-P (= 8), Suppose that the proportions of the individuals 
with attributes PP, QR, PS and QS are a, b, c and d respectively, where a+ft+c-fdJ — 1 . 
In exactly the same way as for the binomial case it is seen that the proportion of samples 

n\ 

with i PR's j QR'b, k PS's and I QS's is and the distribution of samples is 

given by the multinomial form 

f {a b c d)^. • • • • • (5.62) 

The distribution given by this form is bivariate, one variate being the number of P’s and 
the other the number of P’s. The characteristic function of the distribution is 

(f, = + </)”.(5.63) 

We have then 

log <f) == log/a + & + + d/ + ai(ti -}- 1^) + bii^. + — -xih 4- ^ 2 )^ —^ ^2 o • • • 

== + b)t 2 + i(a -f- c)ti — — tl -+ . . .|« . (5.64) 

From this it is seen that the mean of the variate corresponding to the occurrence of P's 
is 7i{a -f 0 ), and that of the variate corresponding to the occurrence of the P’s, n(a + 6). 
From the terms in and in the expansion of (5.64) w© find also that the variances are 
7i{a -f r)(l — a + c) and n{a + 6)(1 — a + 6), If we now transfer the origin to the mean 
of the variates wo have 

log ^ 4“ ^4(1 — ^ 4“ c) + t^ia + ^)(1 — a b) -f- ^tit^ia — a + ca 6)} -f* 0{n), 


Thus when the distribution is expressed in standard measure and n allowed to tend 
to infinity the characteristic function tends to the form 

log - Wi + 4 + .... (5.65 

a - (a + c){a + b) 


whore 


P == 


■((fl' c)(l — a *4" c)(Gf' 5)(1 — a 4'" ^)}^ 

This, as was seen in Example 3.15, is the characteristic function of the bivariate form 


” 271(1 -7 ^ { ~ 2Tr^V) ~ - 00 < Xy, X, < 00 (6,66) 

Thus the multinomial form (5.62) tends to the form (6.66), which may be regarded as the 
bivariate analogue of the normal distribution. 


I 



134 


STANDARD DISTRIBUTIONS 


If the two attributes P and B ore independent in the population, that is to say, the 
proportion of P’s among P’s is the same as among the not-P’s, we have 

a _ c 
d h c *4“ 


and hence 


a-hc a + c 

d h c * 4 “ ^ Of h 1 

so that a — (a + b){a + c) = 0. Thus p in equation (5.65) vanishes. In this case and 
only in this case the distribution (5.66) becomes 


dF = 


V(27r) 

i.e. and are independent variables. 


*•**1 dx ^ 


1 


dxi, 


V(27t) 

This is what we should expect and, indeed, is 
necessary if our use of the word “ independent^’ in relation to attributes and frequency- 
distributions is to be consistent. 


NOTES AND REFERENCES 


For further formulae about the constants of the binomial distribution, including the 


incomplete moments 




see Frisch (1926) and Romanovsky (1925). Some of 


the results are given as exercises below. See also Haldane (1939). For the formulae of 
the hypergeometric distribution see K. Pearson (1895 and 1924). On the distribution 
functions of the binomial and the hypergeometric, see Camp (1924 and 1925). 

On Poisson’s distribution reference may be made to Whitaker (1914), “ Student ” 
(1907 and 1919) and Morant (1921). 


Camp, B. H. (1924), “Probability integrals for the point binomial,” Biometrikay 16, 163. 

- (1925), “ Probability integrals for the hypergeometrical series,” Biormtrikay 17, 61, 

Frisch, R. (1926), see refs, to Chapter 3. 

Haldane, J. B, S. (1939), “ The cumulants and moments of the binomial,” Biometrika, 
31, 392. 

Morant, G. (1921), “ On random occurrences in space and time when followed by a closed 
interval,” Bicmietrika, 13, 309. 

Pearson, K. (1895), “ Skew variation in homogeneous material,” Phil. Trans. A., 186, 343. 

-(1924u), “On the moments of the hypergeometrical series,” Biometrika, 16, 157. 

-(19246), “ On a certain double hypergeometrical series and its representation by 

continuous frequency surfaces,” Biometrika, 16, 172. 

Romanovsky, V. (1925), “ On the moments of the hypergeometrical series,” Biometrika^ 
n\ 57. 

Sheppard, W. F. (1939), The Probability Integral, British Association Mathematical Tables, 
vol. 7, Cambridge University Press. 

Soper, H. E. (1922), Frequency Arrays, Cambridge University Press. 

“ Student ” (1907), “ On the error of counting with a haemacytometer,” Biometrika, 5, 351. 

- (1919), “ An explanation of deviations from Poisson’s law in practice,” Biometrika, 

12 , 211 . 

Whitaker, L. (1914), “ On Poisson’s law of small numbers,” Biometrika, 10, 36. 



EXERCISES 


135 


EXERCISES 

5.1. Show that for the binomial distribution {q + 

d/K, ^ . 

»•> 1- 

Hence, writing c = pq, g =* p — q, that the cumulants are n times the following values—- 
K| = c ; Kt — eg ; =^c ~ 6c*; ks = g (c ~ 12c*); «, = c — 30c* + 120c*; 

K, = g (c - 60c* + 360c*); /c* = c - 126c* + 1680c* - 6040c*. 

{Cf. I^ch (1926) and Haldane (1939), who give formulae up to ki,.) 


5.2. Show that for the incomplete moments about the mean of the binomial 


equation (6.12) holds, i.e. 



(Romanovsky, 1926.) 


5.3. Writing = 
are given by 

n 

fit — 

jm.p 



^ show that the incomplete moments of the binomial 


i«. = pgT, 

Pi ■= pgTJp -{n + l)p} + npqpt 

fit = pqT^[{p — (n + l)p}* + pq (2w — l)]+ npq {q — p) fit 
and generally 

r— 2 

Pf = pqT.ip - npf-'^ + npq [ 7 )f'i “ P 

(Frisch, 1926. This is the generalisation of equation (5.11) to incomj)lete moments.) 



5.4. Show that about the origin of the hypergeometric distribution 


5.5. From equation (5,4S) derive the recurrence formula for the moments of the binomial 

{(1 + Ey — E’-}{npqpt — PPi) = Pr+i 
and that for the Poisson distribution 

{(1 + £y — I!''}Zflt — Pr+i> 


(K. Pearson, 1924.) 



136 


STANDARD DISTRIBUTIONS 


5.6. Show that if y 


a\/2n 


X* 

e 2^ 


j: 


y^dx 


20\/71 


Hence, if a normal distribution is grouped in intervals with total frequency a^nd Nt is 
t;he sum of the squares of frequencies, an estimate of a is 

7^2 

" 1 0 282,095^. 

2Ni\/7t Ni 

For the height data of Table 1.7 show that this gives an estimate of a equal to 2‘55ii, an 
error of about 1 per cent. 

(Yule (1938), Biometrika, 30, 1.) 


5.7. If a distribution of type (5.24) is represented approximately by a binomial 
(Q + F)% show that 

vP = np 

vPQ = nfq — n var p 

var p . , 

BO that P = p and hence is positive ; consequently that v is positive. 

If, however, the distribution is of type (5.28), then 


P 


(w — 1) var p 


V 


eo that P, and hence v, may be negative. 


c Student,’’ 1919.) 


5.8. The bivariate Poisson series. Show that when <7, b and c in equation (5.62) 
are small but na(= As), nb{^ Xi — As) and w(== Ag — As) are finite, the distribution tends 
to the form whose general term is 

AsM^i — hy(^2 — As)^ 
iljlkf 

5.9. Show that if the frequencies of two symmetrical binomial forms of degree n 
are superposed so that the rth term of one is added to the (r + 1) term of the other, the 
resultant frequencies are those of a symmetrical binomial of degree (n + 1). Deduce that 
if two normal distributions with the same variance and means differing by a small part 
of the variance are added together, the resultant distribution is nearly normal. 



CHAPTEE 6 


STANDARD DISTRIBUTIONS—(2) 

6 . 1 . In this chapter we continue the account, begun in the last, of the standard 
distributions of statistical theory. From the variety of forms assumed by the frequency- 
distributions of experience, as exemplified in Chapter 1, it is evident that an elastic system 
would be required to describe them all in mathematical terms. Three approaches will 
be considered herein : the first, due to Karl Pearson, seeks to ascertain a family of curves 
which will satisfactorily represent practical distributions ; the second, due to Bruns, Gram 
and Charlier, seeks to represent a given frequency function as a series of derivatives of 
the normal frequency function ; the third, due to Edgeworth, seeks for a transformation 
of the variate which will throw the distribution at least approximately into the normal form. 


Pearson Distributions 

6.2. It was noted in 5.23 that in the limiting case the hypergeoraetric series can 
be expressed in the form 

^J. = _ (6 1) 

dr. bo + b^x + box^ . ^ ^ 


This equation may be considered from a slightly dilferent standpoint. The unimodal 
distributions of Chapter 1 suggest that it might be worth while examining the class of 

1 f 

frequency functions which (a) have a single mode, so that — vanishes at some point a: = a ; 


{h) have smooth contact with the ir-axis at the extremities, so that — vanishes when / = 0, 


Evidently these conditions are in general obej^ed by any distribution of the family (fi.l). 
In actual fact, as will be seen below, there are also solutions of (6.1) in particular cases 
which are J- or U-shaped. 

The family of frequency functions defined by (6.1) are known as Pearson distributions. 
Before obtaining explicit solutions of the equation, we consider certain general results 
which are true of all members of the system. We have immediately 


or 


(6o + b^x -f biX^) df = (x ~ a)f dx 


x^{b^ + biX + b^^) ^ (^)fdxo 


Integrating the left-hand side by parts over the range of the distribution, we find, assuming 
that the integrals exist, 


j^a^(6o + biX + — I {nboX^^ ^ + {n + l)biX^ + [n + 2 )b 2 X^’^^}fdx 

^QO -QO 

== I a:”+^/da: — aj x^f dx . , (6,2) 

J •—00 J -^cKy 


Let us assume that the expression in square brackets vanishes at the extremities of the 

137 



138 


STANDARD DISTRIBUTIONS 


distribution, i.e. that lim —»■ 0 if the range is infinite. We then have, sub- 

a?—►4:00 

stituting moments for integrals in ( 6 . 2 ) 

— — (n + l)bxfin - {n + 2)6,-- aix^ 

^ + {in + l)bt - a)K + {{n + 2)6, + = 0 . • (6,3) 

This equation permits of the determination of any moment from those of lower orders. 
In fact, all moments can be expressed in terms of a, 6 o, 6 i and 6 , and the moments /^o(— 1) 
and Conversely we can express these four constants in terms of the moments fz\ to iJi\, 
or the three moments about the mean yu, to Putting w = 0 , 1 , 2 , 3 , successively in 
(6.3), we find equations for a, 6 o, 6 i, 6 , which result in 


^ _ /^8(/^4 4* 3//!) 

a- ^ - 

A_- 3ju|) 

* A 

h /*»(/<« + 3/t|) 

b, -^- 


6 . = - 


where 


— 3/4| — dfil) 




y/ "I" 3) 
A' 

3^i) 

A' 

V •4" 3) 
A'^ 

- 3/?i - 6 ) 
A' 


lOySj - 18 - ] 


I - 12/^11 
12^1 J • 


It follows that a curve of the family ( 6 . 1 ) ia completely determined by its first four 
momenta, fi'i to /t' 4 . The origin, of course, ia at the mean. 

6.3. In equation ( 6 . 1 ) the mode is evidently at the point x = a. We have then 
for the Pearson measure of skewness (3.31) 

Sk = ~ ° (p fM 

Vjm. 16/5, - 12/3, - 18 * * ' * 

the form given in 3.31. 

Further, if we take an origin at the mode so that a = 0 we find 
d xf f 

Txi ~dib7T'bix~+b^^ ^ 

Thus any points of inflection in the frequency curve are given by 

^"^^1 .( 6 . 8 ) 

Hence there cannot be more than two of them, and if they exist, they are equidistant from 
the mode. It is not to be inferred that a curve of the family cannot have a single point 
of inflection, for one point corresponding to the solution of ( 6 . 8 ) may be outside the per¬ 
missible range of ar. 

6.4. By a simple transformation of the origin to the mode, ( 6 . 1 ) may be written 

d _ a" — a 

Bo -f- Bi{x — (t) Bo(x — a)^ 


(log/) 


X 

Bo +SiX + BoX» 




THE PEARSONIAN SYSTEM 


130 


The explicit expression of the frequency function / is thus a matter of integrating the right* 
hand side of (6.9). 

Following Pearson, we may distinguish three main types according as the denominator 
on the right in (6.9) has real roots of opposite sign, real roots of the same sign, or imaginary 
roots. Pearson also distinguished ten other types, some entirely trivial, when the JS’s 
take particular values.* 


Type I 

6.5. Let 

Then 


-j- BiX -|- BfX* = Bt{X -{■ oci)(X^ — *j), 

d . . X 

dX ^ Bt{X + ax)(X - a.) 


«!, a, > 0 




+ 


«» 


^«(»i + *j) (-^ + *i) ■®»(oti + *t) (-3^ — *») 


giving / = k(X + «x)-b>(»>+“*)(X — 

This is generally written in the form 




( 6 . 10 ) 

( 6 . 11 ) 


where 


nil _ Wj 

The range of the curve is from — Ui to aa and by integrating between these values we find 


which, on putting x = {ai + aj)y — Uj reduces to 

1 = hVy 

J 0 


(a^ 4- 


k{ai + 


This determines k and we have 
/ = 




B{mi + 1 , m-a + 1 )* 


-+ -Ti^ ~ -T * • 

^nii -f" 1 , wia "f* 1 )\ ^i/ \ ^z/ 


(Oi + £(mi 

The origin here is the mode. Taking an origin at the start of the curve we have 


/ = 




(Ui + £('^1^1 + 1 , + 1 ) 

or again, measuring in units {ai -+ a^) times the original, 

1 


0-^r 


/ 


B{mi + 1 , m# + 1 ) 


. t ”^ j (1 — x)^* 


. (6.13) 


6.6* In these expressions the a’s are necessarily positive, but the m’s may have any 
value greater than — 1 . They cannot be less because the distribution function of ( 6 . 12 ) 
or (6.13) would not then converge. 

♦ The numbering of the types followed herein is that of Elderton (1938). Some variations occur in 
earlier literature and the reader miist not be surprised to find the normal curve referred to occasionally 
as Type VII. 



140 


STANDARD DISTRIBDnONS 


If mi, m, > 0 the distribution is evidently unimodal and zero at its extremities. If 
one of the to’s is between 0 and 1 the corresponding terminal frequency is still zero, but 

the frequency curve makes a sharp angle with the a;-axis, for ^ is not zero at the terminal. 

If one and only one m is less than zero the curve has an infinite ordinate and is thus 
J-shaped. If both m’s are less than zero the curve is U-shaped. 

The condition that JSo 4- shall have real roots of opposite sign is that 

Be and R* are of opposite sign, which is equivalent to 

-A<o 

4RoR. 

or, in terms of J5i and /3„ from (6.4), 


/Si(P. + 3)» 


< 0 . 


(6.14) 


4(2^1 - 3/J, - 6)(4/?. - 3^0 
The quantity on the left was denoted by Pearson by the letter k and provides a criterion 
which will occur again below. The equation remains true even though (6.4) is true of 

moments about the origin, because the quantity —p is invariant under change of origin, 

4x5ojDji 


The frequency function of the Type I curve is calculable directly from its equation. 
The distribution function, as may be seen from (6.13), is expressible in terms of incomplete 
JJ-functions. 


T^pe VI 

6.7. If the roots of J5o + are real and of the same sign it is easy to see, 

in the manner of the preceding sections, that the frequency functions may be written in 
the form 

f = —--- x-^x — «)«• where g-i > ffg - 1 . (6.1.5) 

““ 32 — 1) 3a + t) 

where the range lies from a to oo if a is positive and from — oo to r? if a is negative. By 

the simple transformation ~ this reduces to the Type I form (6.13), 

It will readily be verified that if * > 0 the curves are unimodal with zero frequencies 
at the terminals. If g'j < 0 the start is J-shaped and the di-stribution falls away to zero 
at infinity. The distribution function may be expressed in terms of incomplete R-functions, 
and in this case the quantity k of (6.14) is greater than unity. 

Type JV 

6 . 8 . If the roots of B„ -f RjX + R,Z* are imaginary we have 



R,{(X +» + <5*}’ 

giving log / = log A; -f ~ log {{X -f y)* + 5®} — tan-® ^ 

/ = k{(X + y)* + exp j- ^ tan-® . 


. (6.16) 



SPECIAL TYPES OF THE PEARSON DISTRIBUTION 


141 


This is Pearson’s Type IV and is usually'written in the form 

/ = .(6-17) 

The distribution has unlimited range in both directions, tends to zero at infinity and is 
nnimodal. The calculation of its ordinates may be assisted by some tables by Comrie 
(1939). The distribution function has to be found either by quadratures from the frequency 
function or by the use of some tabulated integrals given in Tables for Siatisticiam arid 
Biometricians, Part I. For instance, for the constant k in (6.17) we have 


1 

k 


f® / ^a\-ni -1* 

L(‘+y 

n 

6 do 

2 


« aF{2m — 2, v) in Pearson’s notation. 

In this case the quantity k of (6.14) lies between 0 and 1. 

The above are the three main types in the Pearsonian system. The remaining types 
are described briefly below. A number of results which the reader can easily verify for 
himself are given without proof. 


The Normal Distribution 

6.9. If, in equation (6.9), a ^ Bx ^ ^ 0, we have 



X 

Bo 


r» 

f = 

If this frequency function is to have a convergent distribution function, Bq must be negative, 
^ — or^ saj^, and we get the familiar form 

1 -.£1 

f ^ e 2a« — 00 < a: < OO. 

<Tv(2jr) 

Thus the normal distribution itself is one of Peaison’s types. 


Type II 

6.10. If in equation (6-9) = 0 and Bo are of opposite signs, the distribution, 

a particular case of Type I, becomes of the character 

1 / 

+ — <-«■ ■ . • (6.18) 

{a here being diflerent from the a of (6.9)). 

In this case the criterion k of equation (6.14) is zero. The distribution is symmetrical 
about the origin and ranges from — a to + a. If rn> 0 it is unimodal with contact at 



142 STANDARD DISTRIBUTIONS 

the tenninals of the range; if to < 0 it is U-shaped. If to »= 0 the distribution becomes 

^ *** ft ft ft ft (0.19^ 

the so-called “ rectangular ” distribution. 

Type VIJ 

6.11. If in (6.9) Ri == 0 and R„ are of the same sign, we find 

-})('+Si) -«<»<» . .(6.20) 

The range is now unlimited in both directions. Here also the criterion k of (6.14) vanishes, 
but the dififerenoe between this case and that of Type II lies in the fact that here /?, > 3, 
whereas in the Type II case /S, < 3. 


Type III 


6.12. If in (6.9) R, 


0 we obtain the distribution 


( x\P 

. . . .( 6.21 

tilifi being the form with the origin at the mode. The curve is unlimited in one direction 
• ® 

(positive or negative as ~ is positive or negative). It is unimodal if p > 0, J-shaped if 


p < 0, The condition JS, = 0, from (6.4), is equivalent to 2^a — 3/9i — 6 = 0, i.e. k of 
(6.14) is infinite. 


Type V 

6.13* If the roots of JSq + BiX + are equal, i.e. /c = 1, we arrive at the dis¬ 
tribution 




,p-i 




•' Tip - 1 )' 
which ranges from 0 to oo and is unimodal. 


0 <x < cx) 


. ( 6 . 22 ) 


Types VIII, IX, X, XI and XII 

6.14. The remaining types are of a more special character still. 

If in (6.9) Bq = 0, Bi > 0 we have 

j — w 

Type VIII : / = 1 + -j , 0 < to < 1, - a < a: < 0 . (6.23) 

If J5o — 0, R, < 0 we have 


Tyi^IX: 

If Ro = Rj == 0 we have 
1 -£ 

Type X: f ~-e ” 0 <,x <, oo 

If Ro = Ri = 0 we have 

Type XI : / = b”*~^{m — 


a < a; < 0 


. (6.24) 

. (6.25) 


1 )*-”* 


6 < a: < 00 


. (6.26) 



SPECIAL TYPES OF THE PEABSON DISTRIBUTION 


143 


Finally, as a particular case of Type I when 5/J, — 6/5i — 9 = 0, equations (6.4) become 
indeterminate. In this case we have 


Type XU: f 



_ 1 _ 

(ai -f ai)i?(l + m, I — m) 



1 m I < 1, 


— «! <a; <a* . . (6.27) 


6*15. Pearson curves of Types I and III, and to a somewhat smaller extent, those 
of Types V and VII, arise in the theory of sampling and would in any case have to be 
studied in that theory. Apart from this, the principal use that has been made of the 
distributions in the theory of statistics is in fitting them to observed distributions such as 
those of Chapter 1. It has been found that in many cases the Pearson distributions provide 
a remarkably good fit to observation. 

A systematic account of the technique of fitting will be found in Elderton’s Frequency 
Curves and Correlation (1938). Wo will hero merely indicate the general principles and 
give one example of fitting in what is, perhaps, the most difficult case. 


6.16. All the Pearson distributions are determined by the first four moments, ili\ 
to inclusive, except some of the degenerate types which are determined by fewer than 
four moments. Pearson’s method of fitting consists of 

( 1 ) determining the numerical values of the first four moments of the observed dis¬ 

tribution ; 

(2) calculating the numerical values of /?i, /?*, k (equation 6.14) and hence determining 

the type to which the distribution belongs ; 

(3) equating the observed moments to the moments of the appropriate distribution 

expressed in terms of its parameters ; and 

(4) solving the resulting equations for those parameters, whereupon the distribution 

is determined. 


The following example will illustrate the process:— 

Example 6,1 

In Table 1.15 there are shown, in the column totals, a distribution of 9440 beans accord¬ 
ing to length. The figures are repeated in Table 6.1 on page 150. Required to fit a 
Pearson distribution to these data. 

For the moments it is found that, with Sheppard’s corrections, 

(centre at 14-5) = - 0 190,783,898 
^2 = 3-238,424,951 

5-306,566,352 
//< = 50-999,624,044 

= 0-829,135,838, V/?i = - 0-910,569 

=: 4-862,944,362 

First of all, as to type. For the criterion k (6.14) we have 

^ _+ 3)2 _ 

4 ( 4/92 - - 3 / 9 , ~ 6 ) 

_ 51*262 
84*040* 



144 


STANDABD DISTRIBUTIONS 


This lies between 0 and 1 and hence the appropriate curve is Type IV. We have to deter¬ 
mine a, m, V in 


/ = 


aF{2fn, 


2 , r)(^’^a*) 


e o. 


Writing tan fl - and 2m — 2 «= r we find 
a 

/^n = 008 ^“^ d sin^ 0 ci!6, 

j —« 

whence, integrating by parts with oos’‘~^ 0 sin d as one part, 

((TO _ l)a/i;_2 - VH'n-\ }. 

a particular case of (6.3). Hence, in terms of moments about the mean, 


i^i = 

/M 2 = 

= 

(«4 == 


av 

r 


(r* + v^) 


whence it is found that 


r%r - 1)^ 

4aV(r® -f- V*) 

- IHT-T) 

3a*(r « + y«) (r+~6(ri‘ + r*) -8^ 
- l)(r - 2)(r - 3) ’ 

6(/9. - - 1) 


- 3/3, - ff 

r = _ _ 

V{16(r 

® ~ “ ^^‘ 0 ’ 

Substituting for /3j, /3i and we find 

r - 14-697,72, rn = 8-348,86 

V = 18-380,43 a — 4-169,49 

The signs here want a little watching, r and m present no difficulty ; but a is to be taken 
positive and v positive since v^/?i is to be considered negative. 

From the tables of F{r, v) we evaluate the constant term k and finally arrive at 

/ = 0-395,121^1 + 

The frequencies given by this curve are shown in Table 6.1 on page 150. 


6.17. The following points are worth noting in connection with the fitting of Pearsor 
curves to observational data:— 

(1) Although the various types have dissimilar analytical equations they merge intc 
one another in geometrical shape. For instance. Type V may be regarded as transitiona 



TCHEBYCHEFF-HERMITE POLYNOMIALS 145 

between Types IV and VI and is very similar to the shape assumed by those curves 
near #c = 1. 

(2) It is tacitly assumed that the data can be represented by a curve with finite moments 
up to the fourth order at least. Curves for which higher moments do not exist were 
cdled by Pearson heterotypic; but there is nothing sinister about them except that they 
do not fall within the Pearsonian system. 

(3) In calculating moments, Sheppard’s corrections are usually to be employed when 
there are contacts of sufficiently high order at the terminals. In the case of J- or U distri- 
butions the other corrections mentioned in 3.27 may be employed. This case sometimes 
raises difficulties in that the resultant curve does not start in the right place. In such 
circumstances there is no golden rule. The most satisfactory course is to try several curves 
(or the same curve translated to several points) and to judge by the results which of 
them gives the best fit. 

(4) The quadrature of Pearson curves, as indicated in the foregoing, may in some 
cases be effected by tabulated integrals ; but the more generally applicable procedure 
appears to be to calculate ordinates direct from the equation of the curve and then to 
find areas in ranges by Simpson’s rule, Weddle’s rule, or some similar process of quadrature. 

6.18. The mathematical description of an observed distribution by a Pearson curve 
may be regarded from two rather different standpoints. If our object (for instance in 
actuarial work) is to obtain a mathematical expression which will satisfactorily represent 
observation and allow of accurate graduation and interpolation, fitting by moments is 
generally satisfactory. The method has, however, been criticised when the observed data 
are regarded as samples from a population, and it is desiied to find a mathematical repre¬ 
sentation of that population. In such cases the moments calculated from observation are 
only estimates of population-moments. It has been objected that they may be inefficient 
estimates, and alternative methods have been proposed. We shall have to defer a full 
discussion of this point until the second volume. 


6.19. Other systems of curves have been studied, mainly by Scandinavian writers, 
with a view to representing frequency functions by expansions in series. It is well known 
in mathematical and physical work tliat fimctions can often be usefully expressed as a 
series of terms such as powers of the variable (Taylor’s series) or trigonometrical functions 
(Fourier’s series). Neither of those forms is very suitable for frequency functions, but 
we proceed to consider another set of functions with more promising possibilities. 


Tchebycheff’Hermite Polynomials 

6.20. Writing 


and 


a[x) = 


1 

- e 4 


D = 


dx 


consider successive derivatives of a(;r) with respect to x. 

Dol{x) = — xol{x} 

D*a(a;) = {x^ •— l)a(z) 
D^(x{x) = (3a: — x^)x{x). 


We have 


A.8.—VOL. T. 


L 



146 


STANDARD DISTRIBUTIONS 


and BO on. The result will obviouely be, in general, a polynomial in x multiplied by «(«), 
We then define the Tchebycheff-Hermite polynomial by the equation 

DYa.{x) H^{x)ol{x) .(6-28) 

Evidently Hf{x) is of degree rvnx and the coefficient of af is unity. By convention =* 1. 
We have 

and also, by Taylor’s theorem 

.(» - 0 - LWaW - 

Consequently Hf{x) is the coefficient of ^ in exp ~ *2 )• follows that 

rm ^4) 

H.(x) = af - - 2^3-jX^-o + . . (6.29) 


The first ten polynomials are 

.ffo =1 
H, =x 

= X* - 1 

Hi = X® — 3x 
Hi = X* - 6x® 4- 3 
H, = X* — lOx® + 15x 
Hi = x« - 16x« + 46x* - 15 
H, = x» - 21x» + 105x» ~ 105x 

= x« - 28x« 4- 210x« - 420x* 4- 105 
H, = x» - 36x’ 4- 378x* - 126()x® + 945x 
Hit = xi« - 45x« 4- 630x« - SlSOx® 4- 4725x* - 945 


. (C.30) 


6.21. The polynomials have a number of interesting properties. Differentiating the 
identity 

exp 

' ' y-o 

with respect to x and identifying coefficients in we have 

~/i,(x) = r//,_j(x).(6.31) 

and generally I^Hf(x) = r‘’'/7,_j(x).(6.32) 

DiOerentiating the identity with respect to t and identifying coefficients in f “* wo have 

Hr{x) — xjff,_i(x) 4- (r — l)£r,_ 2 (x) =0. . . . (6.33) 

From (6.31) and (6.33) together we find 

dx» dx 


. (6.34) 



THE GRAM-CHARLIER SERIES OP TYPE A 


147 


It is also known that the equation in x, Sf(x) = 0, has r real roots, each not greater 
fr(r — 1) 

in ab«)lute value than / -i—^(Cf. Charlier, 1931.) 

Tables of the values of the first six polynomials to 10 decimal places proceeding by 
X = 0 (O-Ol) 4 have been given by Jorgensen (1916). 


6.22. The polynomials have an important orthogonal property, namely, that 

H^{x)H„{x)x{x) dx = 0 m 


r 


= n\ 


m 


. (6.35) 


In fact, integrating by parts, we have, if m <n, 

^OO ptaO 

I dx == (— 1 I dx 

= (-l)"|^i7„0"-iaj” +(-l)"-'j” 

The term in square brackets vanishes and, in virtue of (6.31), the integral becomes 

m( “ f dx. 

J —00 

Continuing the process, we find either zero, if m is not equal to n, or ml if ni = n. 


The Oram-Churlier Series of Type A 

6.23. Suppose now that a frequency function can be expanded formally in series 
of derivatives of a(x). (We shall discuss the conditions under which such an expansion 
is valid below.) Wo have then 

GO 

fi^l = ^<^iH}{x)x{x). 

Jimmm 

Multiplying by H^{x) and integrating from — oo to oo we have, in virtue of the orthogonal 
relationship (6.35), 

Cf = j S{^)Hr{x) dx .(6.36) 


The reader familiar with harmonic analysis will recognise the resemblance between this 
procedure and the evaluation of constants in a Fourier series. 

Substituting in (6.36) the explicit value of Hy(x) given in (6.29) we find 


1 r , , , rW , 'I 

2 222 !’^'"** * ’ J 

In particular, for moments about the mean, 

Cq = 1 

Cl =1= 0 

Cz = K/'a 1) 

Ca = ^//s 

^4 = MfU - fi/^2 + 3) 

Cc == — 15/^4 + 45//a — 1^>) 

Of = •gVr(r(/^’ — 21/^5 + 105^3) 

C | = — 28/^3 + 210/^4 420 /^* - I - 


. (6.37) 


. (6.38) 



148 


STANDARD DISTRIBUTIONS 


Thus we find the formal expansion 

f{x) 1)'^2 “4' 4" ""■ 4* 4*' • • *} • (6.38) 

If f{x) is in standard measure the series becomes 

f(x) ss* a(a7){l + A +•..}• • • (8.40) 

This is the so-called Gram-Charlier series of Type A, 


Edgeworth^s Form of the Type A Series 

6.24. Consider the Fourier transformation of a term II^{x)cil{x). 


Since 


' dx 


V'( 27 r) a(<) = era = f a c 

J_* V(27r) 

d^ /*®® £>itx 

we have V(2:i)^^a{<) = (~ l)W{^n)H,{t)oiit) == e“a da: 

and thus the characteristic function of x'<x(x) is fV(2;r)fr,(«)a(«). Conversely, by the In¬ 
version Theorem of 4.3, we have 

ara(a;) = f e"*** i’‘V'(2:a)i/,(<)a(<) dt. 

Interchanging x and t, wo find 

'v/(2jt)(— iYFx(t) = f e~^^‘H,{x)x(x) dx 
J —OD 

and hence, changing the sign of t, that the transform of Hf{x)x{x) is ■\/{2n)i'fx{t). 


Consider now the expression 


Its characteristic function is 


exp {k^)x{x) 


(6.41) 


j e''* exp (K,D<')x{x)dx = j dx 

^ f e^^Wonix) dx 
1' J —00 

r e«tr(_ 1 dx 

J‘ J —00 

== a/(27i) x(t) exp {—Kfity 

In a similar way it will be seen that the characteristic function of 

.|a(a-) 


exp 


o b 


1 ! 


-£» + 


2 ! 




h 

3! 


!£>» + ~*D* . . . 
4! 


is equal to 

v'(27r)a(0 exp \ity + + • . -j 

More generally, if 


(6.42) 

(6.43) 

(6.44) 


(a*—7n)* 


Bix) = —^—e ^ o* 
' ay/{2n) 



EDGEWORTH’S FORM OF THE TYPE A SERIES 


149 


the charaoteristio function of 
exp 

is equal to 




+ 


K, 


2 ! 




go- + =;i>‘ 




m 


(6.45) 


V(2^)oc(<<T)c<'»«exp . . j . (6.46) 

as may be seen by the same line of argument. 

Now suppose that (6.45) represents a frequency function. Its cumulative function 
is then the logarithm of (6.46), i.e. is equal to 

+^i +1 (i ,).+^ +... 


and hence its cumulants are /cj — <x + m, /Cj ~ 6 + cr*, /Cg, etc. We may 

take a ^ m and b = and thus we obtain a distribution whose cumulants are /Ci, /cj, . . . 
etc. Now if these are in fact the cumulants of a distribution the series (6.45) must be 
equal to that distribution, provided that (1) the series converges to a frequency function, 
and (2) it is uniquely determined by its moments. 

If we take the frequency function to be expressed in standard measure, then /ci = 0, /c j= 1 
and (6.45) becomes 

( 1 

exp i . . . M.r) =/(a;) .... (6.47) 

whore we have written a(x) for ^{x) because now m vanishes and j 

A series of this kind was derived by Edgeworth (1904), though from an entirely different 
approach through the theory of elementary errors. Equation (6.47) is formally identical 
with (6.40), and the reader who consults the original memoirs on this subject may be 
puzzled by the fact that Edgeworth claimed his series to be different from the Type A 
series and better as a representation of frequency functions. The explanation is that 
for practical purposes it is necessary to take only a finite number of terms in the series and 
to neglect the remainder. If we take the first k terms in (6.40) the result is in general 
different from that obtained by taking the first (& -- 1) terms of the operator in the expon¬ 
ential of (6.47). The argument centred on the fact (cf. Example 6.3 below) that the 
terms in (6.40) do not tend regularly to zero from the point of view of elementary errors, so 
that in general no term is negligible compared with a preceding term. 


6.25. In standard measure the relations (6.38) become, in terms of cumulants, 

Cq = Ij Cl == Cg == 0 I 



c = '"A 
* 120 

== ^^^('<^6 + lO/f*) 

= soTd^'^’ + 

+ 36k®) 


. (6.48) 



150 


STANDARD DISTRIBUTIONS 


6.26. In the praotiosl representation of freqaenoy functioxis by the Type A series 
only the first few terms can be taken into account. The term in Hf{x) has a ooefiicient 
dependent on and for r > 4 this is tmreliable owing to sampling fluctuations. When 
sampling effects are not in question the series may be taken to more terms, usually not 
higher than the term in 1?,. We should then have to investigate how far the observed 
distribution can be represented by the series 

in the hope that the remainder after these terms could be neglected in comparison. 

It may be noted in passing that the distribution function of such a series is easy to 
obtain. If 

f{x) — 

then f f{x) = Hf{x)a.(x)dx 

J —OO J —QO 

= — a(a:), . • . . (6.50) 

TABLE 6.1 


Fitting of Pearson Ty^ye IV Distribution and Oram-Charlier Type A Series to the Data of 

Length of Beans {Table 1.15), 

(From Pretorius, 1930.) 



(The brackets moan that the frequencies shown are rounded up and include some small frequency 
in blank rows covered by the brackets.) 



TETRACHORIC FUNCTIONS 


161 


Emmple 6,2 

Consider the fitting of a Type A series to the bean data of Example 6,1. 

We have already found the first four moments. In standard measure we have 

0-910,569 
= 4-862,944 

and we also find ^5 == — 12-574,125 

fi, = 53-221,083. 

Hence the series is 

9440a(x){l - 0-161,762 Ha + 0-077,622,7 ~~ 0-028,903,6 Hg + 0-014,273,5 He}. 

Table 6.1, on page 160, due to Pretorius (1930), shows the frequencies given by taking 
the first three, the first four and the first five terms of this series (columns headed Type A(l), 
Type A(2) and Type A(3) respectively). A glance at the figures will show that the four- 
and five-term series is no better than the three-term and, if anything, rather worse. Further¬ 
more, the five-term series gives negative frequencies at one terminal and a mode at 12 mm., 
which is contrary to the data. The representation is clearly not very satisfactory and no 
better than that given by the Pearson Type IV curve. 

Tetrachoric Functions 

6.27. The terms may be obtained from Jorgensen^s tables combined with 

those of the exponential e 2 . Some related functions have also been tabulated in Tables 
for Statisticians and Biometricians, Parts I and II. The function 

l)-ii>-^a(:r) H,^,(;r)a(:r) 

--=-(TT)*— • • • • 

is known as the Tetrachoric Function of order r, and tables are available to seven places 
of decimals for r — 0 (1) 30 and a; = 0 (0-1) 4. In the notation of these functions, series 
(6.49) would become 

f(x) = Xi{x) + + —^KtXtix) + . . . 

and the particular series of Example 6.2 would be 

f(x) = 9440{Ti(a:) - 0-743,477 T 4 (a:) -f 0-850,313 T 5 (a:) - 0-775,565 r6(a:) + 1-013,318 T 7 (a;)}, 
The reason for the definition and the name of the function will appear in Chapter 14. 


6.28. Up to this point it has been assumed that a frequency function possesses a con¬ 
vergent Type A series. We shall not here enter into a discussion of the conditions under 
which tills is so, except to warn the reader that a great many mistakes have been made on 
the subject and to quote some theorems without proof. 

(1) Cramer (1926). If f{x) is a function which has a continuous derivative such that 



dx 


converges and tends to zero as | x [ tends to infinity, then/(rr) may be developed in 

the series 


/(^) = 


i-o' 


(6.62) 



152 


STAKDABD DISTRIBUTIONS 


where Cj is given by 

Cf — j f{x)Hf{x)dx. 
J —eo 


This series is absolutely and uniformly convergent for — oo <x < co'. 

(2) A theorem by Cramer (1926) based on one by Galbrun. If/(a:) is of bounded varia¬ 
tion in every finite interval and if 



dx 


exists, then the expansion of f{x) in the series (6.62) converges everywhere to the sum 
i {/(a; + 0) + f{x — 0)}. The convergence is uniform in every finite interval of continuity. 

Cramer has also shown that this last theorem cannot be substantially improved upon 
as regards the behaviour of f{x) at infinity. Consider in fact the function f{x) ^ 

We have, in virtue of (6.33) and (6.31), 



c ^*Hf,(x)dx = f e^^\xHr^idx — (r — l)f e H^^^dx 

J —00 J — 00 

= (r - J-^'Hr.i(x)dx. 


If r is odd the integral vanishes because 17, is an odd function of x. 
the integral becomes 


(2r - l)(2r - 3) 


- ] 

V X 2^r! V2A 


a - 
)■ 


If r is even, say 2r, 


The appropriate coefficient of I? 2 r 'T'yp® A scries is then 

(— l)»’(2r)! 

X ^ Of H^r =-—* The series then becomes 



Now when 



(2r)! /j _ ly 

2^^(v^y\ n) * 


In virtue of the Stirling approximation to the factorial, the rth term of this, say becomes 
in the limit 


so that 



Hence, for A < J the series is divergent. 


6.29. From the statistical viewpoint, however, the important question is not whether 
an infinite series can represent a frequency function, but whether a finite number of terms 



TETRACHORIC FUNCTIONS 


153 


can do so to a satisfactory approximation. It is possible that even when the infinite series 
diverges its first few terms will give an approximation of an asymptotic character. 

This subject h^s not yet been fully explored and there has been some controversy 
about the value of the finite Type A series. Two things seem clear:— 

(a) The sum of a finite number of terms of the series may give negative frequencies, 
particularly near the tails (as, for instance, in Example 6.2). 

(b) The series in the Charlier form (6.40) may behave irregularly in the sense that 
the sum of k terms may give a worse fit than the sum of (i — 1) terms. 

How serious these disadvantages may be depends on the purposes in view. So far 
as practical graduation is concerned it would appear that the finite Type A series is successful 
only in cases of moderate skewness and in many such cases a Pearson distribution is just as 
good. In many statistical inquiries we are more interested in the tails of a distribution 
than its behaviour in the neighbourhood of the mode, and it is here that the Type A series 
appears particularly inadequate. 

But this is not by any means a unanimous view. Ame Fisher (1922) has considered 
a modified form of the series which he claims to meet most of these criticisms. He considers 
the series 

f = {Co + CiHi + . . . c^H^)ol{x) .... (6.53) 

but determines the c’s, not from the observed moments and the relations (6.38) but by the 
method of least squares, i.e. so that 

(^0 + CiH . • . + c^Il^)(x.{x)Y 

shall bo a minimum. The method involves some laborious arithmetic, but Fisher has 
successfully graduated a number of actuarial experiences by using it. 

Two other actuarial statisticians have pointed out the difficulties of the Type A series, 
Sioffensen (1930) adducing some theoretical objections and Elderton (1938) summing up 
in favour of Pearson distributions. 


Examjde 6.3 

As an illustration of the irregular behaviour of terms in the Type A series, consider 
the distribution 


dF == dx . 

rip) 


0 < a: < cx). 


Its characteristic function is 

and thus 

or, in standard measure, 




(1 - iir 

: p(r 1)1 


(r - 1)! 


pa 


-1 


From the manner of the formation of terms in (6.48) it is evident that the coefficient 
is the sum of terms kj, . . . • • • ^am)y where (g^ . . , q^) is a partition 

of r such that no q is less than 3. It will then be clear that, since is of order 2 , the 
term of greatest order in p is that with the greatest number of parts in 
For example, if r == 9 it is (3®), if r = 8 it is (4*), and so on. 



154 


STANDARD DISTRIBUTIONS 


From these coiudderations we can find the order in p of the terms in the Type A series. 
They are 

Term • • • • Og C 4 O 7 Cg Cg Oi® Cu Cjj 

Order in p . . . 0 -1 -11 -1 -IJ -2 -IJ -2 -2^ -2 

The terms decrease in order of p, but not at all regularly, and it is clear that in general no 
coefficient will be negligible compared with a preceding one if p is large. The asymptotic 
qualities of such series obviously require careful investigation in particular cases. 


The Type B Series 

6.30. Just as the Type A is derived from the normal integral, a Type B series has 
been derived by Charlier from the Poisson distribution. Writing 


y{m, x) 


x\ 


for integral values of x, put 

g—m rn 

y(m, x) =-1 e“ * cos (m sin t — xt) dt 

^ Jo 

for all *. When x is integral this reduces to (6.54).* In other cases 


. (6.64) 


, (6.65) 


p'-’7n 

y(m, x) =-sinTTX / (—l)^.,,--r,. . . . (6.66) 

n j\(x - 

Write V>'(»i. a; — 1) = y{m, x) — y{m, x — 1) 

no 

/(^) = .- • (6-57) 

This is the Type B. Charlier recommends it in cases of skewness when Type A is 
inapplicable (though the dividing-line is not clear). In theory it may be used for continuous 
variates, but in practice has only been applied to discontinuous variates proceeding by 
equal intervals. In fact, the objections to Type A apply a fortiori to the continuous form 
of Type B and various other complications appear (cf. Stefifensen, 1930). 


6.31. Defining polynomials 0^ by the relation 


we find that 


y{m, x)0^{m, x) = y(m, x) 

Of — coefficient of ^ in e~Yl + —\ 
r! \ m) 


. ( 6 . 68 ) 


m’’ 


(6.59) 


r2n 

♦ The integral I by the substitution 

Jo 

m* 

unit circle and is thus equal to 2n 


^mz 

z, is 2n times the residue of —in the 

iz^^l 



THE GRAM-CHARLIER SERIES OF TYPE B 


156 


In a wimUftr manner to that used for the Tohebycheff-Hennite polynomials we have 


VOr - =0,-1 


m 


(6.60) 


(6.61) 


which may be compared with (6.31). It may also be shown that 

.... 

y(m, x) 

and thus G, may be calculated from the rth difiFerences of the Poisson function y(w, x) in 
the same way that may be derived from the rth differential coefficient of the normal 
distribution. 

The (?’s also obey the orthogonal law 

— ^ '’'I 

r! ■ ... 

r = s) 

r f 


. (6.62) 




Thus if 


I 


. (6.63) 


x«0 


TABLE 6.2 

Ty'pe B Series fitted to a Discoiitinmus Distribution of Particles emitted by a Radioactive 

Element in Units of Twie. 


X 

Freqiionoy. 

. 

Tj7 )« B 
(2 torms). 

TyjjeB 
(3 tornris). 

Type B 
(4 terms). 

0 

57 

49*5 

49-0 

58-2 

1 

203 

201-3 

201*0 

199-8 

2 

383 

403-4 

404-3 

386-1 

3 

525 

632-3 

633*8 

523*9 

4 

532 

620-6 

621-6 

632-1 

6 

408 

402•() 

402*6 

418-2 

6 

273 

264-8 

254*4 

260-2 

7 

139 

137-1 

136*7 

134-0 

8 

45 

64-0 

63-9 

56*7 

9 

27 

26-1 

26*2 

22-9 

10 

10 

9-4 

9-6 

8-6 

11 

4 

3-0 

3 1 

3*6 

12 

0 

0-9 

0*9 1 

1*6 

13 

1 

0-2 

0-2 

0-8 

14 

1 

0-0 1 

0-0 

0*3 

Totals 

2008 

2606 2 

i 

2607-1 

2609*0 





166 


STANDARD DISTRIBUTIONS 


In the same manner as for Type A we have, choosing the constant m equal to 

60 == 1 

61 0 

6i - ni) 

63 == 

64 = — 6/t* + /^a(ll — 6wi) + Sm{m — 2)} 

etc. 


Example 6A' 

Table 6.2 shows the frequency of the number of alpha-particles (a:) emitted by a bar 
of polonium in intervals of |th of a minute in some experiments by Rutherford and Geiger, 
together with the frequencies given by the Type B series with two, three and four terms. 
The calculations are due partly to A. Fisher and partly to Aroian (1937). 


The Normalisation of Frequency Functions 

6,32. Several of the important theoretical distributions occurring in statistics depend 
on some parameter n in such a way that as n tends to infinity the distribution tends to 
normality. For large n it is often a sufficient approximation to assume the distribution 
normal, but for small or moderate 71 this may be hardly exabt enough. In sucsh a case 
we are nevertheless able to use the normal integral by seeking for a variate transformation 

^ = CTo "f" ~h 0 ^ 2 ^* “■!“ U 3 X® ~j~ , , , , . • , (6.65) 

where the a’s are of order or smaller. By choosing the a^s appropriately we can bring 
the distribution of f much nearer to normality than that of x and hence find the distribution 
function of x from that of assumed normal. 

Consider in fact the Edgeworth form of the Type A expansion (6.45) 


exp 


{- 


Kl 


m 


1 ! 


D + 


Kg — a 
21 




Xs 

3! 


+ 


1—. 

* J(7V27r 




. ( 6 . 66 ) 


We have retained the terms in D and because the approximation may perhaps be slightly 
improved by taking m and cr* in the f-distribution not quite equal to the mean and 
variance of x. 

We now assume that tb.e cumulant k,. is of order a cstse of fairly common occur¬ 
rence ; that Kl — m is, by choice of m, of order ; and that k, — by choice of cr*, 
is of order so that we may write 

Kl — m = lio 
K* — a* = ho* 

Then a* is of order i*e. and thus 


h ==s 0(n“^*) 
h ^ 0{n-^} 



THE NORMALISATION OF FREQUENCY FUNCTIONS 


167 


where h and are 0(»“*) 

li and It are 0(n"*) 

' h is 0(n~^) 

It is 0(»~*), etc. 

Expanding the operator and retaining only terms up to and including 0(n“®) we find 
for the operator 

1 - kaD + iha*D^ - 

+ -A/Io*!)* + g}^1la»D» - hh<y^D» + IhhcJ^D* - -^hha^D^ 

+ ~ ihho^D^ + - ‘hhhoW> + + i ( - tia^D^ 

- \nhd^iy> + ~ 

+ + ihhha^I^ + hhhho^DO) + 

+ |f?i,cr«D« + i/fZyX*'* + thUia^^D^^) .(6.68) 

The result of this operation is a similar expression, which we will not bother to write 

/ Xfft\ 1 

out at length, with the operator o'D'replaced by (— and multiplied by 

The distribution function is given by integrating this expression, and we then have 
for the frequency less than or equal to m + ox (arranging the terms in order of 
magnitude in n) 

”H -j- yV^.hHj) -1- \liltHt + t H“ t + 

+ ^UhHt + .,hlArh + rithhHt + WfJW - i-hllHt + illHt + \nhHt 

+ '1- 4" 4" 4" Ts^^o^* 

+ + rhMH, + T^g-AH, + ThhUJI, “f* "4* i 

H“ 4“ (6.69) 


6,33. Now let f be a normal variate. We will determine i in terms of x such that 


£ 


^ -6 I dy = F{x), 


)-W{27t) 

F(x) being the distribution function given by the Type A expansion (6.69). 
We have 


(6.70) 


i: 




1 e-Jdy^f(i)^f(x+S-x) 


I 


_«V'(2ji:) 


--j 

e i ay 


(x - f) cl p 


1! dxj:.»-v/(27f) 


1 _"* , 

e n ay + .. . etc., 


by Taylor’s theorem, 




168 STANDARD DISTRIBUTIONS 

we see that when a: = 0, f =» — o#, a: — f is of order n ~*; and hence, to order «“• we have 
from (6.71), with « = 0, 

and this is equal to the expression in square brackets in (6.69) with « = 0. 

We then find 

On = ll — "t* ■f' "f” 

We can now find Oi in (6.72) by identifying coefficients in x, and so on. After some algebraic 
reduction we find, writing the terms in descending order in n, 

X — i ■ila(^* 1) “t" 

—•iVV»(6** 3) —J-iiljfa;* —1) +X2^1i(a;* —6a;* +3) -f-^?ii?(12a;* ■—7) 

- ~ 42x* + 1C) + — 187x + 62) — iZjX + ^hlthx 

+ — •s‘5^»^«(7a;® — ICx) — ■^lA(x* — x) + — l^x® + ICx) 

- -|l?^|x + tV^'*1?,(36x® - 49x) - shliiSa* - 32x® + 3i5x) + ^» 5 l,yg(llx» - 21x) 

- - 48x® + 61x) - - 187a;) + ^ij4i,(lllx» 

- 647x® + 466x) - ^^l^(948x* - 3628x» + 2473x).(6.73) 

This is our required expression of the variate f in terms of the variate x. To order 
w“* at least S will be normally distributed. 

It is often more convenient to express x in terms of This may be done by noting that 

* - I = g{x) = fif(| + X - ^) 

- 9(1) + (x - i)m + . . . 

=9(1) + 9’mm + 9'm +. •. 

and by continuing the process 

a; - I = 9(f) + 9(f)9'(f) + 9(f)9'*(f) + i9*(f)9"(f) + 9(f)9'®(f) + |9*(f)9'(f)9"(f) 

+ i9®(f)9"'(f) +.(6.74) 

Hence, using the value of | given by (6.73) we find, after some reduction, 

X - f - lx + iim - 1) + i^f + aV^4(f® - 3f) - ^11(21® - 5S) - ilMP - 1) 

+ ii5l.(f* - 6f* + 3) - - 5f* + 2) + 5|,l3(12f* - .63^* + 17) 

- ii'lf - wm - m + jhim - loi® + icf) + ^jy§(io^® - 25i) 

- - 24f® + 29f) - xJ-6W42f® - 17^® + 21|) + 

- 103f» + 107f)-■r,V«4(252s« - 1688f® + ISllf) . . • .(6.76) 


Example 6.5 

Consider again the distribution of Example 6.3— 


dF 


r(pf 


=x»'-> dx 


0 <x < 00 . 


We have already found that, in standard measure, this tends to the normal form, and 
that K, is of order p *. 

We will take lx and 1, of (6.67) to be zero, which implies that our normal variate £ is 
to have the same mean and variance as that of x. We have 

i, = 2p-* Zg = 6p~^ U = 24p-» U = 120p-*. 



THE NOBMALISATION OF FREQUENCfY FUNCTIONS 


159 


(6.69) then becomes 
1 




-odV'(2^) 


dx 


' V(2jt) 
1 


+ if- + + iob“- 






■}• 


I»et us, as a simple illustration, find the distribution function of rr for p = 9, a: = 12. 
The mean of the distribution is then 9 and its variance 9, so that this corresponds to 
a deviation (12 — 9)\/9 standard measure, equal to unity. It is found from (6.30) 
and an additional equation for that 


H* = 0, iTa - 2, J?* = 2, H, = 6, H, = 16, H, 


20, - 132, H, == 28, 

= 1216, Hii = 930. 


We then find for the distribution function 


1 ^ 


I — 00 \/ 2 Jt 


e dx + 


Vi^n) 


e‘“2(0*015,163,5). 


The values for the normal function are obtained from the tables and we get the value 
0*841,345 + (0-241,970,7)(0*015,163,5) = 0-8450, 

which is exact to four places. The approximation is evidently fairly good, even for values 
of p as low as 9. 

We could have found the same result by using (6.73). Substituting x = 1 in that 
equation we find 

f = 1-015,386, 

and the distribution function for the normal integral with deviate equal to this value of 
f is 0-8450 as before. 

Suppose now we wish to find the deviate x whose distribution function is F(x) = 0-99 
when p ^ 15. 

The normal deviate £ corresponding to such a value is found from tables to be 2-326,348. 
We then have from (6-75) 

* - « - +• 

which will be found to give 

X = 2-697,22. 

This is the value in standard measure. The deviate in ordinary measure is 

15 + xy,/l5 = 25-45. 

This is exact to two places of decimals. 

The example shows that, notwithstanding the non-convergence of the infinite Type A 
series, a satisfactory approximation may be obtained from its first few terms, at least in 
certain cases. We may remark without proof that by an adaptation of a procedure given 
by Cramer (1928) it may be shown that an asymptotic expansion does in fact exist for 
the distribution of this example. 


NOTES AND REFERENCES 

An excellent account of Pearson’s distributions is given in Elderton’s book. Examples 
of the fitting of the distributions to the data of experience aboimd in Biometrika. 



160 


STANDARD DISTRIBUTIONS 


For the Type A series see Charlier (1906 and 1931), Henderson (1922), Cram6r (1928) 
and Bowley (1928), For the T 3 rpe B series see Charlier, Jordan (1927), Aroian (1937) 
and Stojffensen (1930), 

Charlier has also proposed a Type C series, as to which see his paper of 1928 and the 
brochure of 1931. 

For the convergence of the Type A series and Its relationship to elementary errors 
see two admirable papers by Cramer (1926 and 1928). 

A very good general account of those distributions and an examination of the possi¬ 
bility of extending them to the bivariate case is given by Pretorius (1930), who gives a 
number of references. Up to the present no entirely satisfactory system of bivariate 
distributions corresponding either to those of Pearson or to those of Charlier has been 
found. 

For some early efforts by Edgeworth to transform distributions to the normal form, 
see Bowley (1928) and Pretorius (1930). The approach of sections 6.32 and 6.33 is due 
to Cornish and Fisher (1937), who give some tables which are useful in this type of work. 

The polynomials Hf(x) are frequently referred to by English writers as Hermite poly¬ 
nomials, but they are really due to Tchebycheff {Memoires de VAcademie de Saint Petersbourg, 
1860). Hormite’s papers on this subject followed four years later (Comptes rendus, 58, 
93 and 266). 

Aroian, L. A. (1937), The Typo B Gram-Charlier series,*' Ann, Math, Statist., 8, 183. 
Bowley, A. L. (1928), F, F. Edgeworth's Contributions to Mathematical Statistics, Royal 
Statistical Society. 

Charlier, C. V. L. (1906), Researches into the Theory of Probability, Lund. 

- (1928), “ A new form of the frequency function," Meddelande frdm Lunds Astronomislca 

Observatorium, Series II, No. 51. 

- (1931), Applications d Vastronomie (one of the series in BoreFs Traite du cmIcuI des 

Probabilites, Gauthier-Villars, Paris). 

Comrie, L. J. (1939), Tables of ian^^x and hg (1 + a;^), Cambridge University Press. 
Cornish, E. A., and Fisher R. A. (1937), Moments and cumulants in the specification 
of distributions," Revue de VInst. Int. Staf., 5, 307. 

Cramer, H. (1926), ‘‘ On some classes of series used in mathematical statistics,” Den sjette 
Skandinaviske Matematikercongres, Copenhagen. 

- (1928), “ On the composition of elementary errors,” Skandinavisk Aktuarietidskrift, 

13 and 141. 

Edgeworth, F. Y. (1904), “The Law of Error,” Cambridge Phil. Trans., 20, 36, 113 (and 
an appendix issued with bound reprints). 

Elderton, Sir W. P. (1938), Frequency Curves and Correlation, 3rd edition, Cambridge 
University Press. 

Fisher, Ame (1922), Frequency Curves, Macmillan. 

Henderson, J. (1922), “On expansions in tetrachoric functions,” Biometrika, 14, 157. 
Jordan, C. (1927), Statistique Mathirnatique, Gauthier-Villars, Paris. 

Jorgensen, N. R. (1916), Undersegdser over Frequensfiader og Korrelation, Busck, Copenhagen. 
Pearson, K. (1925), “ The fifteen constant bivariate frequency surface,” Biometrika, 17, 268. 
Pretorius, S. J. (1930), “ Skew bivariate frequency surfaces examined in the light of numerical 
illustrations,” Biometrika, 22, 109. 

Romanovsky, V. (1924), “ Generalisation of some types of the frequency curves of Professor 
Pearson,” Biometrika, 16, 106. 



EXERCISES 161 

Steffensen, J. (1930), Some Recent Researches in the Theory of Statistics and Actvarial Science, 
, Cambridge University Press, 

Wishart, J. (1926), ‘‘ On Romanovsky’s generalised frequency curves,’* Biometrika, 18, 221. 


EXERCISES 

6.1. Show that for the Pearson distributions 

d log y __ X _ 

dx Bo + BiX + B^x^ 

the range is unlimited in both directions if Bo + BiX + B^x^ has no real roots ; limited 
in one direction if the roots are real and of the same sign ; and limited in both dn ections 
if the roots are real and of opposite sign. 


6.2. Show that the Pearson Type VI curve may be written 



—m 

e 


—v tanU 


X 

a 


and discuss the relationship with the Typo IV curve. 


6.3. Assign the following distributions to one of Pearson’s typos:— 

dF = ke~^x'~^ 


dF = 


kdt 



dF “ t(l — r^) dr 
dF = 

(All these distributions are important in the theory of sampling.) 


6.4. Show that for the Type B series the coefficients of equation (6.63) may be written 
symbolically— 


6^ = i(// - m)Wl 

(C. Jordan, 1927.) 


6.5. Show that, in the notation of 6.30, 

Vy(m, x) = - ^v{m, x + 1), 

A 

Hence that F^y{m, «) = 1 - /„(A + 1), 


A.8.—VOL. 1 


M 



162 


STANDARD DISTRIBUTIONS 


where ie the incomplete J’-function ; and hence that the sum of the first (A + 1) terms 
■ of the Type B series is given by 

6o{l - IJX + 1)} - (6x + + . : .)y(w, A) 

(C. Jordan, 1927.) 


6.6. Show that if y is a function of x which it is desired to represent approximately 
by the form 

r 

y = 

then the values of the c’s appropriate to tKS expansion of y in this form are such as to 
minimise the sum 


j: 






V«(*) 


df a -4- ic 

6.7. Show that for a Pearson distribution -i — ,——,-,—, 

/ + b^x + 

function obeys the relation 

+ (1 + 26j + bid) + (® + = 0, 

where 6 = it. Deduce the recurrence relation between the moments. 
Show also that the cumulative function obeys the relation 

(^^)*} ■*' 

Hence show that the cumulants obey the recurrence relations 


the characteristic 


{1 + (r + 2)6 ,}k,+i + rbiK^ -f ^ + (^ 2 ^ 


KsK^-i + . . 


+ ^ ^ 1-1 + • • • + ^ j — 0. 


6.8. Show that no distribution which is not completely determined by its moments 
can be expanded in a convergent Type A series. 


6.9. If the distribution 


is transformed by 
and 

show that 


•v/{2ji) 


00 < a: < c» 


X = -^(logio f - /) 


i = log„e, A = e0‘*‘, 

/?! = A*(A + 3)-4 
- 3 == A*(A* + 2A 4- 3) - 6 

/<3(/<i)® - 3/4(yi)‘‘ - yl =0, 


and that 



EXERCISES 


163 


where moment about the start of the transformed curve. Thence that 

Z = 2 log ju'i — J log ini + fii*) 
bk^ = 2 log fix — 21. 

6.10. Show that if a function in standard measure is expanded in a Type A series 
the coefficients of the second and third terms depend respectively on and and thus 
provide measures of skewness and kurtosis. 



I 


CHAPTER 7 

PROBABILITY AND LIKELIHOOD 

7.1* The previous six chapters have dealt with the theory of statistical distributions 
from a descriptive point of view. It has been explained that the distributions occurring 
in practice exhibit certain regular features which permit of representation by mathematical 
forms ; that they can be characterised by certain parameters such as moments and cumu- 
lants ; and that certain general theorems about distribution and frequency functions can 
be deduced. We now begin a study of a different kind, namely, the inquiry whether any 
statements can be made about populations or their parameters and distributions when 
only a sample of the populations is available for scrutiny. Except in trivial cases it is 
not possible to make any statements on these matters with the categorical certainty of 
deductive logic ; but it is possible, and indeed it is necessary if scientific inquiry is to go 
forward at all, to make statements of a less definite nature in terms of probability. In 
this chapter we shall accordingly be concerned with the theory of probability as it affects 
statistics and in subsequent chapters with its applications in statistical theory. 

7.2. In ordinary speech we use the words “ probability‘‘ chance or ‘‘ likeli¬ 
hood ’’ to describe an attitude of mind towards some proposition of whose truth we are 
not certain. We say that it is improbable that life exists on Mars, that the chances are 
that if a penny is tossed ten times it will come down heads at least once, that it is likely 
to rain to-morrow, and so on. It is rarely indeed in practical affairs that we are confronted 
with a proposition of whose truth we are absolutely certain. Nevertheless, we often have 
to assume that such propositions are true or untrue in order to reach decisions and to act 
in a rational waJ^ 

i The attitude of doubt we adopt is described in terms of probability. We say that 
the propositions are more or less probable and accept or reject them accordingly. 

7.3. A little introspection will convince the reader that all the attitudes of mind to 
which we relate the concept of probability have certain things in common :— 

(a) They concern propomtioiis. The mind considers a proposition which has meaning 
and assumes towards it a certain attitude of doubt. It is very common both in mathe¬ 
matics and in statistics to speak of the probability of an event, or even of a variate-value ; 
Lut these are condensed expressions for the proposition that an event will happen or that 
a member of a population has a given variate-value, and, though very convenient shorthand 
expressions which will often be used in the sequel, must not be allowed to obscure the 
essential fact that propositions are concerned. 

(b) There are degrees of probability. We say that it is very improbable that a hundred 
tosses with a i>enny will not result in a head; that it is more probable that horse A will 
win a race than that horse B will do so ; that the probability of having wet weather in the 
course of an English summer is so great as to be near certainty. But it does not follow 
(and some writers on the logic of probability do not admit) that every pair of probabilities 
can be compared. It could with consistence be maintained that, whereas we may compare 
the probability of getting ten trumps in a game of cards with that of getting eight, there 
is no way of comparing the probabilities of the propositions, say, that there exists a planet 
outside the orbit of Pluto and that the human race will ultimately go bald. 

164 



PROBABILITY AND LIKELIHOOD 


165 


(c) The degree of probability attributed to a proposition varies according to the amount 
of relevant evidence available to the particular mind considering the proposition. If we 
know that a horse has won its three previous races we attach a greater probability to the 
proposition that it will win the next. If we know that a penny has heads on both sides 
the probability that it will come down heads when tossed is so great as to amount to 
certainty ; and so on. 

(d) Pursuing this last point, we see that certainty can be regarded as a limiting form 
of probability. As a proposition becomes more and more probable it tends towards certain 
truth; as it becomes more and more improbable it tends towards certain untruth. 

7.4. The object of the theory of probability is to give to the somewhat indefinite 
notions described above the precision of a science, and, since numerical measurement is 
the greatest precision which a science can possess, to measure probability numerically. 
Several writers have explored the more general problem, foreshadowed as early as Leibniz, 
of developing a logic of probabilities, and the reader who is interested may refer to the 
work of Keynes (1921), F. P. Ramsey (1931) and Johnson (1921-4). From the statistical 
viewpoint the interest of this subject centres in the nmnerical theory of probability which 
alone will concern us in this book. 

It is at this point that we arrive at the first of the differences of opinion among 
authorities on the theory of probability. Some writers try to include all the ideas generally 
associated with the word ‘‘ probability ” within the scope of their theory, which is thus 
applicable to any of the attitudes of doubt covered by the meaning of the word. The 
principal modern exponent of this viewpoint is Jeffreys, whose book (1939) should certainly 
be read by all serious students of the subject. Most statisticians, on the other hand, are 
concerned with the probabilities of propositions of a particular kind, namely, those which 
form the members of populations of propositions. Under the more general theory, it has 
meaning to speak of the probability of an isolated proposition such as the one that Shake¬ 
speare’s plays were written by Francis Bacon. In statistics we are more usually concerned 
with the proposition whiqh asserts the happening of some event which could have arisen 
in a specified number of ways, such as the throwing of a number with an ordinary die. 
The first approach takes probability to be an undefined idea, like the straight line of Euclidean 
geometry, and builds up the theory from certain axioms. The second approach seeks to 
define probability in terms of the relative frequency of events and thus to throw the theory 
back on to the pure mathematics of abstract ensembles (Kolmogoroff, 1933) or to the 
limiting properties of sequences (von Mises, 1936). The reader who is perplexed by the 
controversy between the adherents of the axiomatic and the frequency theories will find 
many of his diJBficulties resolved by the consideration that the two theories cover different 
domains of thought, or rather, that the axiomatic theory attempts to cover a wdder domain 
than the frequency theory. 

7.5. This, however, does not explain away the whole of the difficulty, and the reader 
will have to choose for himself among the various possible sets of fundamental ideas forming 
the starting-point of the theory. When we consider the concept of probability as a psycho¬ 
logical matter we can either suppose that further analysis is impossible or unprofitable, 
in which case the axiomatic approach seems inevitable ; or we can ask how the mind comes 
to take up an attitude of belief in propositions which confront it. It is not necessary here 
to go into this question at length, but there would, in my own opinion, be a considerable 
measure of agreement that the concept of probability is founded on om experience of the 



166 


PROBABILITY AND LIKELIHOOD 


frequency of observed phenomena. When we say that tlje probability of a coin coming 
down heads on being tossed is one-half we have in mind, I think, that if it is tossed a large 
number of times it will come down heads in approximately half the cages. Even in extreme 
cases, say, when we attempt to assess the probability of a horse winning a given race, an 
event which cannot be repeated, we are, I think, picturing our estimation as one of a number 
of similar acts and assessing the relative fiequency of the horse's victory in that population. 

But it has to be admitted that, even if this be true, there is no necessity to use the 
concept of frequency in the axioraatisation of the theory. The concept of a straight line 
may very well be founded on our experience of the local properties of rays of light, but it 
does not follow that the indofinables of Euclidean geometry are to be analysed into optical 
concepts. 

The Basic Rules of Direct Probability 

7.6. For our present purposes the problems of fundamentals may be passed over, 
since all parties are agreed on the rules governing the calculus of direct probabilities. (The 
so-called inverse probabilities will require more discussion and will be dealt with later.) 
We therefore enunciate these rules without attempting to deduce them from more primitive 
propositions. 

In the first place it is assumed that probability is measurable on a continuous scale, 
so that any probability can be expressed as a real number. We shall, in fact, say that a 
probability is a real number. This assumption implies, among other things, that any 
two probabilities may be compared ; for if they are measured by the numbers x and y 
we may say that the probability of the first is greater tlian, equal to, or less than, that of 
the second according as > y, = j/, or a: < y. 

7.7. The probability of a proposition q on data p is written P{q | p). We have then 
Rule I :— 

If p entails q, P(q | />) = 1.(7.1) 

If p entails not-<2, P{q | ^>) =- 0 . . . , (7,2) 

This rule defines the end-points of our scale of probability. Certaint y that a proposition 
is not true is represented by zero, ceitainty that it is true by unity. Any probability lies 
in the range 0 to 1. 

7.8. Rule 2. —If ^ equally probable and mutually exclusive 

propositions on data p, and if ^ is a subset of m of these propositions, then 

P(Q\P)=''^ .(7-3) 

This proposition is the starting-point of the frequency theory of probability. It is 
usually stated in some such form as : if of a set of 7i mutually exclusive and equally probable 
events m are distinguished by some characteristic A, the probability of an event bearing 

Am. 

n 

The objection to this rule from the logical viewpoint is that it contains the concept 
equally probable " and is thus circular if one adopts it as a definition. The mathematical 
theorist dealing with probability, in the mathematician’s facile way, overcomes this trouble 
either by accepting the circular definition, or by defining probability purely as a property 
of sets of points. For example, such a definition might be : if of an aggregate of objects 



BASIC EULES OF DIRECT PROBABILITY 167 

n in number m are characterised by some quality A, the probability of any member bearing 
A is, by definition, the number To take a more sophisticated line, we can regard the 

fh 

objects as points of a set, attach set-functions to them obeying certain axioms and postu¬ 
lates, and thus build up the theory of probability as a branch of the theory of set-functions. 
Any verification of the theory, any t^t whether it provides a reasonably accurate picture 
of the way things happen in the world, is referred to experimental physics. The mathe¬ 
matician, of course, is used to this devolution of responsibilities, but the statistician is 
concerned with concordance between theory and practice and cannot always leave experi¬ 
mental verification to others. ^ 

7.9. Rule 3. — If the probabilities of n mutually exclusive propositions 

on data p are Pi . . . P^, then the probability on data p that one of them is true is 

Pi + P2 • • • + Pn' 

This is generally known as the theorem of the addition of probabilities In the 
language of the textbooks, the probability that one of n mutually exclusive events will 
happen is the sum of their separate probabilities. 

7.10. Rule 4. —The probability of two propositions q and r on data p is the product 
of the probability of q given p and that of r given q and p. Symbolically, 

P{qr 1 p) = P (2 1 p)P{r \qp). .... (7,4) 

Since q and r appear symmetrically we also have 

P{qr I p) = P(r [ p)P{q \rp). .... (7.6) 

From the frequcnc}^ standpoint this rule is almost self-evident. If of a set (a) bear 
the characteristic A, {h) the characteristic P, and {ab) both characteristics, then the rule 
states that 

(ah) __ (a) {ah) _ (6) (ab) 
n n * (a) n (b) ’ 

a simple arithmetical proposition. 

More generally wo have 

• • • 9* 12>) == I'iqi I p)Piqi I giP)P{q> I g^glP) • • • P{qk I ?*-i • • • qip) (7.c) 

a result wkich follows from the repeated application of Rule 4. 

If, as a particular case, 

P(^r I p) = P{q 1 p)P(r | ;>).(7.7) 

we have, in virtue of (7.4), 

P(r I p) = P(r I qp) .(7.8) 

and q is then said to be irrelevant to r, given p, A knowledge of q does not affect the 
probability of r on data 

7.11. The above four rules and various elaborations of them provide the basis of 
the direct theory of probability, which is concerned with problems of the type : given a set 
of propositions with known probabilities, determine the probability of some contingent 
proposition. This is a branch of pure mathematics and will be found discussed, for example, 
in most textbooks of algebra. Ultimately all problems in this branch of the theory are 
reducible to the counting of the number of ways in which certain events can happen. The 
following examples will illustrate the type of investigation involved. 



168 


PROBABILITY AND LIKELIHOOD 


Example 7.1 

What is the probability that a specified player will get a hand containing 18 cards 
of one suit at a single deal at a ganae of bridge ? 

We have to oonsidfer here the total number of ways in which a given player can be 

dealt a hand of cards. There are 52 cards and 13 can be chosen from them in ways. 

Of these ways only four will contain cards of one suit. 

We then assume that all the possible deals are equally probable and are thus able to 

( 52\ 

jgj, so that the probability 
4 


IS 


P = 


o 


4.39! 13! 

“ 52! * 


Factorial expressions of this kind may be found from tabled logarithms of factorials or 
by the use of the Stirling approximation. In this particular case we find 

P = 6 X 10"“^^ approximately. 


Example 7,2 

n letters, to each of which corresponds an envelope, are placed in the envelopes at 
random. What is the probability that no letter is placed in the right envelope ? 

The condition that the letters are put in the envelopes at random '' is to be inter¬ 
preted as meaning that every possible way of assigning the letters to envelopes is equally 
probable. The question, under Rule 2, then reduces to the purely algebraic one : in what 
proportion of the possible cases does no letter got into the right envelope ? 

Suppose that is the number of ways in which all the letters go wrong. Consider any 
particular letter. Tf this occupies another’s envelope and vice-versa, which can happen 
in n 1 ways, the number of ways in which the remaining n — 2 letters can go wrong is 
letter occupies another's place, which can happen in (n — 1) ways and 
not.isdce-versa, there are Un^i ways in which the others can go wrong. Hence we have 
the difference equation 


We may re-write this 


1)K-1 •+ 


and putting 
we find 

(- ir^v 

Thus Un - = (- If-- 2 Uj), 

But Uj = 0 and = 1 and thus 

= (- If 




n 




~n\ J' 


whence 


• • # 



BASIC BULBS OF DIRECT PROBABILITY 


169 


The total number of possible ways is n\ Thus the probability required is 

11 (- l)^ 

' 21 3! • ‘ • nl * 

i*e. the first {n — 1) terms of 

Example 7.3 

Throe pennies are tossed. What is the probability that they fall either all heads or 
all tails ? 

We assume that the probability of a head with any penny is J and that the result 
with one penny is independent of that with the others. Then there are eight possible 
and equiprobable cases, HHH, HUT, HTH, HTT, THE, THT, TTH, TTT, Two of 
these give us all heads or all tails and hence the required probability is 

Now consider this argument: there are two possibilities, either the three coins all fall 
alike or two of them are alike and the other different. Of these two possibilities one is 
of the type required and therefore the probability is 

Consider also this argument: there are four possibilities, three heads, two heads and 
a tail, two tails and a head, three tails. Two of these four are of the type required and 
therefore the probability is i. 

Finally, consider this argument: of the three coins two must fall alike. The other 
must either be the same as these two or different. Thus there are two possibilities and 
again the chance is 

These three arguments are fallacious. They assume equiprobability among events 
which are not equiprobable and blie application of Rule 2 is not legitimate. For example, 
in the first case, it is true that, there are two possibilities, but they are not equal under 
our assumptions. The reader may care to examine why tliis is so and how the other two 
arguments break down on the same point. 

Example 7,4 

Peter and Paul play a game with two dice. Peter plays first by throwing the dice 
together. If the total number of points is a prime number other than 2 he wins outright; 
if it is even he throws again under the same conditions ; in other cases the throw passes 
to Paul, who throws under the same conditions. What is the probability of Peter's winning ? 

It IS to be assumed that the probabilities of throwing any number 1 to 6 with either 
die are equal. The possible throws are 2, 3, 4 ... 12 and the number of ways in which 
they can occur are :— 

Total points . . .2 3 4 5 6 7 8 9 10 11 12 Total 

No. of ways ... 1 2 3 4 5 0 5 4 3 2 1 36 

Thus, according to Rule 2, the probability (1) of throwing a prime is (2) of throwing 
an even number is H, (3) of throwing neither is 

These three events are mutually exclusive, Let P be the probability of Peter’s win¬ 
ning. Now if Peter throws a prime other than 2 he wins outright, and the probability 
of his doing so is thus H ; if he throws an even number he throws again, and his proba- 

18P 

bility of winning in this case (according to Rule 4) is ; if he throws neither the throw 
passes to Paul, whose chance is then P, so that Peter’s chance of winning is ^^<^(1 — P). 



170 


PBOBABILITY AND LIKEnHOOD 


Thus, according to Rule 3, we have 


giving 


p_14 18P 4 p 


P = 


18 

22 ' 


7.12, It is possible to carry mathematical problems on the foregoing lines to great 
lengths, and a considerable amount of ingenuity has been expended in doing so. The 
important thing to note from the point of view of the theory of probability is that in all 
such cases certain probabilities are stated a priori, either explicitly or implicitly in some 
such form as “ the dice are perfect ” or the selection is made at random One of the 
most formidable problems of statistics is that only in exceptional cases is there any prior 
certainty about the probabilities of observed events. 


Probability in a Continuum 

7.13. Up to this point we have considered only probabilities of finite and discrete 
events; but we may also ask whether any meaning can be attached to probabilities in 
a continuum. For example, if a square is inscribed in a circle, what is the probability that 
a point taken at random in the circle is also inside the square ? If a line is divided into 
three segments, what is the probability that they can form a triangle ? What is the proba¬ 
bility that X where is a positive real number less than yo ? And so on. 

All probabilities of this kind must be considered as limits. Consider the first example, 
that of the square inscribed in the circle. Imagine the whole figure divided into small 
cells of area ^ by a rectangular mesh. If we assume that the occurrence of a point in a cell 
is equally probable for all cells, the probability that a point falls inside both circle and 
square is the ratio of the number of cells in the latter to those in the former, neglecting the 
cells at the edges which become of diminishing importance as e —0. In fact, the rt^quired 
probability can be made as near the ratio of the area of the square to that of the circle as 
we please by taking t small enough. We may say that the probability is that ratio, which 
2 

is easily seen to be an incommensurable number. 
n 

We should get the same limiting form of probability if we took other meshes which 
adequately represented areas ; but it is most important to specify the method of procession 
to the limit in speaking of probabilities in a continuum. Otherwise the result has no 
meaning. The following example wdll illustrate the point. 


Example 7,5 

Consider a straight line OA bisected at B. What is the probability that a point chosen 
at random on the line falls into the segment OB ? 

Let us suppose in the first place that the line is divided into n equal segments of length 
OA 

—. If we interpret the choosing of a point at random to moan the choice of one of these 

intervals, the probability is obviously J as n —> C30, for there will be half the intervals 
in the segment OB. 

Now let OP be drawn perpendicular to OA and equal to it in length, and imagine a star 
of n -f 1 lines drawn through P, including OP and PA, so as to divide the angle OPA 




THE VON MI^’ES APPROACH 


171 


into equal angles —. These lines out off segments on OA , and we may, if we regard 

equal angles as haidng equal probability, assign to these segments an equal probability, 
for they subtend equal angles at P. If we make this convention it is evident that as 
CO the probability of a point falling into any segment on OA is proportional to the 
angle subtended at P, For example, the probability that a point falls in the segment OB 

is tan^^i 

Now this is not the same answer that we got by assuming all small segments of OA 
equally probable. There is nothing paradoxical in this—the two answers are different 
because the two limiting processes were different. On a little reflection it will be clear 
that by moving the point P on the perpendicular to OA and taking a star of lines as before 
we can make the probability of obtaining a point in OB have any value we like. It is 
thus abundantly clear that the concept of probability in a continuum depends on the limit¬ 
ing process by which that continuum is reached from a finite subdivision of equiprobable 
intervals. 



7.14. We have spoken above of the selection of objects “ at random In the 
mathematical theory of probability it is customary to define randomness in terms of proba¬ 
bility itself. A member of a population is said to be chosen »at random if it is chosen by 
a random method ; and a random method is one which makes it equally probable that 
each member of the population will be chosen. Randoqj^ness is extremely important in 
the theory of sampling and we shall consider it at some length in the next chapter. At 
this point it is sufficient to note that when we speak of random choice we really mean 
a method of selection which gives to certain propositions an equal probability and hence 
allows us to apply the calculus of probability a priori. The justification for this is, in the 
ultimate analysis, empirical. It is found in practice that there exist selective processes 
w^hich educe members of a population in such a way that the constituent events may be 
regarded as equiprobable ; and the theory of sampling is largely concerned with samples 
generated by such processes. 

It may be noted that, for continuous probabilities, randomness is dependent on the 
process to the limit just as probability itself is. 

The Approach of von Mises 

7.15. Riipposo now wo have a pojmlation of objects, each of which bears one of 
a number of characteristics. To simplify the exposition we will suppose that there are 
two characteristics denoted by 0 and 1. Suppose we draw members from this population 
and replace each member after drawing. Then the process of continued selection will 
generate a series such, for example, as 

K = 01100100111010111100100 . (7.9) 

VonMises (1936) takes as the foundation of his theory of probability an infinite sequence 
of this kind, the Irregular Kollektiv, obeying the following law’^s :— 

{a) The proportions of O’s in the first n terms tends to a limit as w —> oo. This limit 
is called the probability of the zero in the Kollektiv. 

(6) If a subsequence is picked out of the Kollektiv by some method which is inde¬ 
pendent of the Kollektiv itself (e.g. every third member, every member whose ordinal is 



172 


PBOBABILITY AND LIKELIHOOD 


a square, every member following a 2;ero, etc.), the limit of zeros also tends to for w oo; 
and this for every such subsequence. 

The Irregular Kollektiv might, in fact, be described as the infinite random series. It 
has no systematic qualities; for if, for example, the series consisted of repetitions of 
0110, thus 

K = 011001100110 .(7.10) 

the subsequence consisting of every (4f + 3)th would consist entirely of unities and the 
condition (6) would be violated. 

7.16. It is not difficult to show that probability defined in this way obeys the four rules 
enunciated earlier in this chapter. Some authorities have, however, found difficulty in 
accepting the basic concept of the Irregular Kollektiv and attributing any meaning to its 
existence. It has even been claimed that the idea is self-contradictory, though this von Mises 
strongly contests. 

However this may be, the von Mises approach represents, in my own opinion, the nearest 
to a satisfactory basis of the frequency theory of probability that has been given. The 
mathematics of the subject are much the same in any of the frequency theories once the 
fundamental rules have been established, but when it comes to relating theory to experience 
the von Mises method has decided advantages. For a discussion of this subject, reference 
may be made to the works listed at the end of this chapter; in particular I have given (1941) 
the outline of a theory which in my view eliminates the difficulties associated with the 
Irregular Kollektiv. 

Probability and Statistical Distribution 

7.17. Wo now proceed to consider the relationship between the theory of probability 
and that of statistical distributions. Suppose we have a statistical population, finite and 
discontinuous, distributed according to a variate x. If we take a member at random from 
this population the probability that it bears an assigned variate-value is the frequency 
function fix^), for this is the proportion of members bearing that value. Further, the 
probability that it bears a value less than or equal to Xq is the distribution function F(.ro), 
as follows at once from Rule 3 and the definition of the distribution function. 

This is the essential link between probabilities and distributions. The distribution 
function gives the probability that a member of the population chosen at random will bear 
a specified value of the variate or less. We must, however, consider whether this statement 
can still be regarded as true for populations which are infinite or continuous. 

Suppose in the first instance that the population is infinite and discontinuous. In such 
a case we cannot select a member at random, but we may, in the manner of 7.13, imagine 
a selection from a finite population which tends to the infinite form under consideration. In 
this finite population the proportion of members with values less than or equal to some 
will be F{x^) and thus, with due regard to the nature of the limiting process, we may still say 
that in the infinite population the probability of a value less than or equal to is F{Xfy). 

Similarly for a continuous distribution. In Chapter 1 we considered the continuous 
form as a limiting expression of 

AF ^f(x) Ax. 

If a member is chosen at random from this population in such a way that equal ranges Ax 
are equally probable, the probability that it falls in the range Ax is f(x) Ax. In the limit we 
may say that the probability of obtaining a value less than or equal to Xo in taking a member 



SAMPLING DISTRIBUTIONS 


173 


at random from a continuous population is | dF = F{xo). It must, however, be remem- 

J —CO 

bered that the nature of the process to the limit should be specified. 

Hereafter, in speaking of selecting a member at random from a population dF = f(x) dx 
we shall assume that what is meant is a selection random in the limit for intervals dx, i.e. 
such that intervals dx are equally probable. 

% 

The Concept of RaTidom Variable 

7.18. The idea of a variable x which can appear with varying degrees of probability 
dF = f{x) dx has been elevated by mathematicians into a distinct concept, that of a random 
variable. In ordinary analysis no such idea appears. We write a variable x meaning 
that we are considering propositions about numbers which may be any of a certain range ; 
there is no thought that one of these values is to be considered more frequently than others 
or that it will occur more frequently in practice. The random variable, on the other hand, 
is to be regarded as defined by a distribution function. It may take any values in a given 
range, but the values are distinguished by an associated function. 

7.19. Let us consider what is meant by the addition of random variables. In ordinary 
analysis, given two variables x and y, we may define a third variable 

z — X + 

which merely means that when x = Xq and y ^ yo, z will be Xq + If ^ a^iid y are random 
variables, can wo attach any useful meaning to z ? 

If the joint distribution function of x and y is F 12 , we have that the frequency of x < 
and y ?/o ^'12 yo)- Consider some value z^. We may then determine from the 

frequency such that x + y <Zo which will, in fact, be the integral 

J (x, y) 

taken over the region for which x + y <Zo, 

This integral defines a function of Zo winch is in fact a distribution function, for it is zero 
at — ao, non-decreasing, and unity at + 00 . We may then define this as the distribution 
function of the random variable z and say that z is the sum of the random variables x and y, 

7.20. More generally, suppose we have n random variables distributed in the multi* 
variate form dF{Xx . . . xj. Wo may then define a random variable s by a functional 
equation 

z = z(xi , , , xj .(7.11) 

The distribution function of Zo is the integral of dF(xi . . . over all values of 

such that Zo > z(xi . . . x^). We may regard the equation (7,11) as defining a new random 

variable z with this as its distribution function. 

Sampling Distributions 

7.21. We have noted that if a member of a population is chosen at random, the 
probability that it will bear a variate-value not greater than x is the distribution function 
F{x), Similarly, if we choose a member from a multivariate population, the probability 
that it will bear a value of the first variate not greater than a*,, of the second not greater 
than Xa, . . . of the 7 ith not greater than x^y is the multivariate distribution function 



174 


PROBABILITY AND LIKELIHOOD 


F{Xu Further, if the variates are independent, as defined in 1.33, the rth 

variate being distributed as dF^{Xf), this probability is equal to 

Fi{xi) F,(a;,) . . . F„(a:„). 

Now suppose that we have a ,selective process, which we will call sampling, applied to 
a univariate population in such a way that it abstracts a group of n members. If this process 
is repeated it will generate a multivariate distribution, each sample exhibiting n values 
Xi . . . x„. The nature of this multivariate distribution depends on the sampling process 
as well as the population. If the distribution is 0{xi . . . *„), then this function represents 
the probability that a random sample will result in n values, the first not greater than Xi, 
the second not greater than and so on. 

There is one type of sampling process of outstanding importance in statistical theory, 
namely that in which the distribution 0{xi . . . *„) is the product of factors Gi(xi), 
G,{xt) . . . In such a case the sampling is said to be simple. The distributions of 

the values Xi ... x^ are independent one of another, and we may thus say that the selection 
of any member is independent of that of any other. Moreover, if the sampling is random, 
every Gf(x) wUl be equal to F(x), the distribution function of the population. Thus in this 
case we have, for the distribution of the variate-values in samples of n obtained by a simple 
random method, 

dF(xi . . . xj = dF(xj) dF(xt) . . . dF(a;„) 

= . • . f(Xn) dXi dxt . . . . . (7.12) 

and F(xy) F(itt) - . . F(x^) is the probability that in such a sample the first value will not 
exceed x^, and so on. Moreover, since the x’e appear symmetrically in (7.12) their order is 
not material. The equation gives the probability that one member of the sample will not 
exceed'Xi, another x„ and so on. 

74t2. Suppose now we have a sample of n members of the population with variate- 
values x, . . . x„. We may construct from these values some function, say 

z = 2 (Xi . . . x„),.(7.13) 

which might, for example, be the mean or the variance. We may then ask : on certain 
hypotheses as to the way in which the sample was derived, what is the probability that z is 
not greater than some assigned value Za ? In terms of frequency, if all possible samples 
Xi . . . x„ were drawn and z computed for each of them, what proportion would fail to 
exceed some value z# ? 

As an illustration, suppose we draw a sample of two from the normal population 

1 

dF = — c 2o* dx. 

oV(27t) 

Let the sampling be simple and random. Then in virtue of (7.12) the probability of values in 
the ranges centred at x, and x, is 

dP = —^exp I — • • • C^-l^) 

Consider now the quantity ' 

X, -t- X, 

2 ■ 

What is the probability that z shall be not greater than some assigned z® ? It is seen to be 
the integral of dP in equation (7.14) over the region such that J(Xi + x,) <Zo, i.e. 



BAYES’ THEOREM 


175 


P(2 < 2.) = *' exp ~(xl + a:|)|e?a!i dx^ 

Write 

z =» ^{afx + ®,) 

y = i(xi - «,). 

The int^al becomes 

. 

Thus 

1 _*? 

P(2o — \dz^ <2 <Zo + ffZzo) == —T—e <'• rfzo, . . . (7.16) 

oyn 

a result which, remembering the relation between probability and the distribution function, 
we may express by saying that z is distributed normally with variance The distribution 

function of the statistic z is given by (7.15) and its frequency function by (7.16). 

7.23. In the more general case of a statistic z = z(Xi . . . w^e see that the prob¬ 
ability of z < Zo is obtained by integrating the joint distribution of Xi x„ over the 
domain of o^’s such that Zq > z(xi , . . x^). This gives us the distribution function of the 
random variable z defined in terms of the random variables Xi . . , x^ by the equation 
z = z(xi . . . a:,,). We shall develop this subject systematically in Chapter 10. 

When the a;-values are chosen by a simple random process the distribution of z is called 
a simple random sampling distribution, or more shortly a sampling distribution. Unless 
otherwise specified the words ‘‘ sampling distribution '' are always to be taken to refer to 
sampling under simple random conditions. 


Bayes' Theorem 

7.24. We now revert to the theory of probability. Suppose that qi • • • qn 
alternative propositions and let H be the information available, p some additional informa¬ 
tion. Then by Rule 4 


whence 


Thus 


P{q,p\H}=.P{p\H) P{qr\pH) 
= P{q, I H) P(p I q^) 


P(q, 1 pH) _ P{_p \ q,H) 
P(q, I H) P(p I //) ’ 


P(gr]pH) 


P(q,\H)P{p\q,n) 
"P{p\H)' '• * 


Since the truth of one of the q's is certain we have, summing for all g’s, 


^pP{q,\H)P{p\q^) 


(7.17) 



17a 


PROBABILITY AND LIKELIHOOD 


whence, from (7.17), 


PiSrlpH)^ 


P{ qr\H).P(p\q^) 
YP\q,\fl) P(fr^) 


. (7.19) 


or, for variations in 

P{q, I pH) oc Piq, I H) P{p [q^H) .(7.20) 

This is Bayes* Theorem. It states that the probability of q^ on datap and H is proportional 
to the product of that of q^ on if and p on q^ and H, 

The principal application of the theorem lies in reasoning from observed events to the 
hypothesis which may explain them. The theory of this subject is accordingly known as that 
of “ inverse ** probability. Suppose, in fact, that an event can be explained on the mutually 
exclusive hypotheses gi . . . q^^ and let H be the data known before the event happens, so 
that H is the basis on which we first judge the relative probabilities of the g’s. Now suppose 
the event to happen. Then Bayes* theorem states that the probability of q^ after it has 
happened (i.e. on data H and p) varies as the probability before it happe7ied multiplied by the 
probability that it happens on data q^ and if. The probability P{q^ | pH) is therefore called 
the posterior probability, P{q^ | H) the prior probability, and P{p | qfi) will be called the 
likelihood. 

In this book the word “ likelihood ” will be used solely in this special sense. 


7.25, The practical use of Bayes* theorem depends on a knowledge of the prior 
probabilities. When they are known we can calculate and compare the posterior prob¬ 
abilities of the hypotheses, and if we have to choose one in preference to others wo choose the 
one with the greatest posterior probability. But we are rarely, if ever, given the prior 
probabilities. And tliis brings us to what is perhaps the most contentious point in the 
modern theory of probability. 

Bayes stated (though ho appears to have felt more hesitation than most of his followers) 
that if there was no known reason for supposing that the prior probabilities were different, 
they were to be assumed equal. This is Bayes* postulatey which is to be distinguished from 
the theorem of (7.19). It immediately resolves the difficulty of apjdying the theorem, and 
before discussing the postulat/C and describing other approaches to the matter, it may be 
useful to give two examples of the use of the postulate in practical problems. 


Example 7.6 


An um contains four balls, which are known to be either (a) all white, or (b) two white 
and two black. A ball is drawn at random and found to be white. What is the proba¬ 
bility that all the balls are white ? 

We have here two hypotheses, qi and ga- On gi the probability of getting a white ball 
is 1, on g, fr is From (7,19) we have 


P(g.lpH) 


P(7. I H) 

F(g, fH) + iF(q, | B) 

mi. I 


Now, in accordance with Bayes’ postulate we assume 


P(q, 1 II) = P(q, I H) := i 

and find 

Pin. I pP) = h 



BAYES' THEOREM 


177 


We are thus led to prefer the hypothesis qi that all the balls are white, since this has the 
greater posterior probability. 


Example 7.7 

From an um full of balls of unknown colour a ball is drawn at random and replaced. 
The process is continued m times and a black ball is drawn each time. What is the prob¬ 
ability that if a further ball is drawn it will be black ? 

The question as framed does not admit of a definite answer, for, there being an infinite 
number of possible colours and combinations of colours, we do not know what are the 
hypotheses which are to be compared. Let us suppose that the balls are either black or 
white, and thus consider the hypotheses (1) that all are black, (2) that all but one are black, 
(3) that all but two are black, and so on. The problem still lacks precision, for the number 
of balls is not specified. Suppose there are N balls. We shall later let N tend to infinity to 
get the limiting case. 

Consider the hypothesis that there are R black balls and iV—2? white ones. The prob- 
^ R 

ability of choosing a black ball is and that of doing so m times in succession, in virtue of 


Rule 4, is 



If the g’s have equal prior probabilities we have, from (7.19), 



Now the probability of getting a further black ball on hypothesis is Since the 

hypotheses q are mutually exclusive, the probability of getting a further black ball is, in 
virtue of Rules 3 and 4, 





m4 1 



Tliis is the answer to the limited form of the question. As > oo this tends to the quotient 
of definite integrals 


J 0 


dx 




dx 


m 4- 1 
m 4 2* 


This is a particular case of the so-called Succession Rule of Laplace. Enthusiasts have 
applied it indiscriminately in some such unconditioned form as the statement that if an event 
is observed to happen m times in succession the chances are w + 1 to 1 that it will happen 
again. This is clearly unjustified. 


7.26. The principal difficulties arising out of Bayes’ postulate appear from the stand¬ 
point of the frequency theory of probability. If we adopt the axiomatic approach, in which 

A.s.—VOI. I. 2X 



178 


PBOBABILITY AND LIKELIHOOD 


probability is a measure of attitudes of mind, it is reasonable to take prior probabilities to bo 
equal when nothing is known to the contrary, for the mind holds them in equal doubt. The 
frequency theory, however, would require the states of events corresponding to the various 
g*8 to be distributed with equal frequency in some population from which the actual g has 
emanated, if Bayes’ postulate is to be applied. This has appeared to some statisticians, 
though not to all, to be asking too much of the universe. The postulate is one of the crucial 
points in the theory of probability. Adherents of the axiomatic school accept it. Many of 
those of the frequency school explicitly reject it. 

There is still so much disagreement on this subject that one cannot put forward any sot 
of viewpoints as orthodox. One thing, however, is clear—^anyone who rejects Bayes’ 
postulate must put something in its place. The problem which Bayes attempted to solve is 
supremely important in scientific inference and it scarcely seems possible to have any 
scientific thought at all without some solution, however intuitive and however empirical, to 
the problem. We are constantly compelled to assess the degree of credence to be accorded 
to hypotheses on given data; the struggle for existence, in Thiele’s phrase, compels us to 
consult the oracles. 

The Principle of Maximum Likelihood 

7.27. The school of statisticians which rejects Bayes’ postulate has substituted for it 
an apparently different principle based on the use of likelihood. Reverting to equation (7.19) 
we see that for any q,. and H 

P(q, 1 pH) cc P(q, | H) L{p \ q^) .(7.21) 

where we now write L{p | q^H) for the likelihood function. The Principle of Maximum 
Likelihood states that when confronted with a choice of hypotheses q we are to select that one 
(if it exists) which maximises L{p [ q^H), In other words, wo are to choose the hypothesis 
which gives the greatest probability to the observed event. 

It is to be particularly noted that this is not the same thing as choosing the hypothesis 
with the greatest probability. In fact, some adherents of the frequency theory of i)robability 
d^ny any meaning to the expression “ of a hyx)othe8is ”, and the j)rinciplo of 

maximum likelihood was introduced largely to replace the notion of ‘‘ inverse ” probability 
which leads to the use of such a phrase. 


7.28. Suj^pose (as is nearly always the case in statistical work) that the hypotheses 
with which we are concerned assert something about the numerical value of a parameter 0. 
In such a case we shall speak of a statistical hyi)othesis. For instance, the hypotheses might 
be s 0 < 0, q^ ^ 0 > 0, in which case there are two alternatives. Or we might have 
r- 0 = 1, q^ ^ 0 2, and so on, in which case there is a denumerable infinity of 

hypotheses. 

If now 0 can have only discontinuous values, we may, confronted with an observed 
event p, require to estimate 0, or to ask what is the ‘‘ best ” value of 0 to take on the evidence 
p. The method of Bayes would state that the ‘‘ best” value was the most j)robable value. 
In (7.21) we should seek for that q^ which made P{q^ \ pH) a maximum. If we know nothing 
of the i^rior probabilities P{q^ | //) we should, in accordance with Bayes’ postulate, assume 
aU such probabilities equal. We then merely have to find that q^ which maximises L{p | q^H), 
In other words, the postulate of Bayes and the principle of maximum likelihood result in the 
same answer and are equivalent. 



THE PRINCIPLE OF MAXIMUM LIKELIHOOD 


179 


7.29. This position apparently does not hold if the permissible values of 0 are continu¬ 
ous. We must now replace such expressions as | H) by P(6o—I H) 
and in place of (7.21) we got 

P(flo — I pH) oc P(0o — idflo <0 < 0o + | H) 

X L(p I 00 -- IdO^ < 0 < 00 + id0o, H). . . (7.22) 

If we now require the “ best ” value of 0, wo should, in accordance with Bayes’ postulate, 
take the prior probability to be a constant and once again we should have to maximise L for 
variations of 0. 

We might, however, have chosen to represent our hypotheses, not by 0, but by some 
variate ^ functionally related to 0, e.g. the standard deviation instead of the variance. In 
this case we should have reached equation (7.22) with <f> written everywhere instead of 0 ; 
we should have taken the prior probability as constant; and we should have arrived at 
the conclusion that we should maximise L for variations of 

But are we being consistent in so doing ? If we assume that the elementary intervals 
of 0 are equiprobable we cannot assume the same of and thus the use of Bayes’ postulate 
appears to involve self-contradiction. The principle of maximum likelihood is free from 
this difficulty, for if L(0) is to be maximised for variations of 0 it will, at the same time, be 
maximised for variations of since 

dL 

dO d(f> dO 

and the two sides of this equation vanish together. 


7.3(L This is one of the grounds on which adherents of the frequency school have 
rejected Bayes’ postulate in favour of the principle of maximum likelihood ; but in my view 
the matter has been misunderstood. It would seem that Bayes’ postulate and the principle 
give the same answer in the continuous case as well as in the discontinuous case when proper 
regard is had to the limiting processes involved. We saw in 7.13 that in speaking of 
])robability in a continuum it was essential to specify the nature of the process to the limit. 
If we regard 0 (from the frequency viewjjoint) as having emanated from a population by 
a process random in the limit for intervals dO, then Bayes’ postulate applied to this process 
will clearly give a different answer from that obtained by supposing that 0 emanated by 

a process random in the limit for different just as the prob¬ 


abilities in Example 7.5 are different, and for the same reason. Thus the apparent incon¬ 
sistency is not an inconsistency at all, but a difficulty introduced by ignoring the limiting 
process in continuous populations.* 

For ap extended discussion of this subject reference may be made to Kendall (1940). 
In the present volume it need not concern us to take it farther, though considerable use will 
be made of the principle of maximum likelihood in Volume 2. Jt will there be seen that the 
Ijriiiciple has many important statistical properties. No one, in fact, denies the importance 


♦ A fiuther difficulty arises if 0 can lie in an infinito range, for then Bayes’ postulate apparently 
leads to the conclusion that i)rior probabilities in any tinite range are zero and hence so are posterior 
probabilities. Tliis does not arise in the likelihood method. Jeffreys overcomes it by assuming that the 
prior probability in such a case is inversely proportional to the paramoior 0. Looking at the problem 
generally, wo need not bo surprisod that tho difficulty appears since the ranging of 0 over an iniinite 
range is also a limiting process. In practice wo art? never so ignorant a priori as to suppose that 0 can 
be any value however large with the same probability, and if we consider the rang() as determinate 
but unknown, likelihood and Bayes’ iJOstulate continue to bo applicable and to give the same results. 



180 


PROBABILITY AND LIKELIHOOD 


of the principle or its usefulness in certain cases; the controversy hitherto has centred on 
the considerations by which the acceptance of the principle as a rule of conduct is to be justi- 
fied. The reader who cannot accept Bayes’ postulate and the foregoing argument that it is 
virtually identical with the principle has a choice of courses. He can accept the principle 
as a new and distinct postulate of scientific inference ; he can regard it as justified by its 
mathematical and statistical properties ; or he can rely on a more sophisticated approach 
which will be touched on in Chapter 9, namely, that the principle leads to estimates of 
parameters with minimum sampling variance when such exist. At this stage he may be 
prepared to accept it on intuitive grounds.* 

7.31. Although in the remainder of the present volume Bayes’ postulate and the 
principle of maximum likelihood will not often appear explicitly, we shall frequently use 
a type of argument which is, in the ultimate analysis, based on them. A certain event or 
series of events is observed ; on a hypothesis H the occurrence of these events is found to be 
highly improbable ; and therefore H is rejected in favour of some hypothesis which makes 
the observations more probable. To take a very simple example, we toss a penny twenty 
times and find that it comes down heads every time. If the penny were unbiased 
(hypothesis H) the odds against this event would be 2®® — 1 to 1. Thus we reject H in 
favour of the hypothesis that it is in fact biased in favour of the heads. 

It will readily be seen that this type of argument is a somewhat indefinite form of the 
inverse type with which we have been concerned. The chief difference lies in the fact that 
it is used to reject unlikely hypotheses rather than to accept the most likely, possibly a safer 
but certainly a less precise procedure. 


The Central Limit Theorem 


7.32. To conclude this chapter we prove an important theorem which gives the normal 
distribution a central place in the theory of probability and the theory of sampling. It has 
already been shown that the distribution appears as the limiting form of the binomial and 
the Pearson Type III distribution when expressed in standard measure. We shall prove 
a much more general result, due to Laplace but first proved rigorously by Liapounoff, that 
under certain conditions the sum of n independent random variables distributed in whatever 
form tends, when expressed in standard measure, to the normal form as n tends to infinity. 
This is the famous Central Limit Theorem. 

Let us note in the first place a simple but powerful result connected with the character¬ 
istic functions of sums of independent random variables. If we have n such variables 
distributed as dFi . . . dF^ the element of frequency of their sum z ^ Xi + . . . is the 
integral of dF^ . . . dF^^ through the element of volume between z and z + dz. Thus the 
characteristic function of their sum, being the integral of through the range of ; 2 , is equal to 


f- 


J —00 J —00 

1 [ e’*-' 
J —00 






/•yj 

. . I dF„ 

J —00 


— i>i 4^2 • • • 4n* 


♦ An approach of a rather different kind has been developed m recent years by Neyrnan (1937), 
who bases his theory of inference only on diroct probabilities. An account of tfiis theory will be given 
in the second volume. 



THE CENTRAL LIMIT THEOREM 


181 


That ie to say, the characteristic function of the sum of a number of independent random 
variables is the product of their characteristic functions. The cumulative function is 
accordingly the sum of their cumulative functions. 

Now as to the Central Limit Theorem itself. We first of all outline the proof briefly 
and unrigorously to indicate its essential features, and then give a rigorous proof. Suppose 
we have distributions Fi . . . F^, all with finite second moments and with characteristic 
functions <f>i . . . <l>„. We have for any F^ 

= 1 + fi'i^ + + -Br 

when iZ is a remainder term. Similarly we have 

Writ) = -f + i?r- 

Hence the cumulative function of the sum of the independent variates will be 

m 


+ FR. 


We can without loss of generality take the mean of the sum as origin, so that Z/Zj = 0, and 

X 

now transforming to standard measure by the transformation f we find 

7t 

Since i’/Zg is of order n the remainder term will be of order i.e, of order ; and thus 
tends to zero. We shall then have 


lim W{t) = 


/a 

“o 


lim 0{t) = e 2 

and hence in virtue of the converse of the First Limit Theorem (4.12) the distribution of 
the sum of the random variables tends to normality. 


7.33. The rigorous enunciation of the theorem and its proof are as follows :— 

If n independent random variables are distributed in the forms Fi . . , with finite 

n 

variances //a,i • • • /^a,n then the sum of the Variables divided by \/M^ 

tends to the normal form, provided that for any e> 0 

, n 

lim ---- x^dFj = 0. • . • • (7.23) 

n— x\>eVMn 

The implications of this condition, which is a modification by Cramer of one due to 
Lindeberg, are not very obvious, but it involves that 

-► 00 and -> 0.(7.24) 

in other words, that the totc^l variance tends t^infinity but that the proportional contribution 
of each constituent tends to zero. To see that (7.24) follows from (7.23) we note that if M^, 
does not tend to infinity it must, being an increasing function, tend to a constant. It would 



182 


PROBABILITY AND LIKELIHOOD 


follow from (7.23) that the sum of the integrals, each of which is positive and not small for 
every e, would tend to zero, which is impossible. Further, if did not tend to zero, then 
at least one of the terms in (7.23) would not do so, and thus the sum would not do so. 
We have 




eV^. dFf 


r r Ur 

= 1 + I eVA/rt dFj» 

Expanding the exponential with a Maclaurin remainder we have 


{ vm ) j| 




/ ^ vm: +® 2 jicJ 


+ + -vk 


2if„ ^ ejf„£ 


0 <101, 10'I <1. 


We may without loss of generality suppose the mean to be zero and hence we find 

/ * A _ 1 _ , 0 r 


Kv'm) * 'iM, 

Thus for some T > 1 we have, for | < | <T, remembering that 

f \xO\dF <eVM,,{ x'^dF 


f lxI.>eVA^» 


(IF 


0 < I e" 1 1. 


Hence, in virtue of (7.24) the coefficient of 0*' is as small as we please and thus <f)^^ 
tends to unity as w —> oo uniformly for | ^ | <7\ Thus we have 


X'J'J 


<'+">Wrk) 


for sufficiently large n and | ?; | < e. Thus for e < I, 


2 i/’, ^ j/:' 


20"T^/ r \ 

+ - J/' (^/‘2. t ■+ a-'-dFA 


Summing for j we have, in virtue of (7.23), 


\p( ,^ “ ’IT' + 20"T^{€ + vanishing quantity) 
and thus for | /1 < T 




the convergence being uniform in any finite ^-interval. The theorem follows from the con¬ 
verse of the First Limit Theorem. 



NOTES AND REFERENCES 


183 


7.34* The following comments will amplify the above proof. 

(а) The Lindeberg condition (7.23) is necessary as well as sufficient. A proof is given by 
Cramer (1937). 

(б) The condition may be put in other forms, for which see Cramer (1937), Uspensky 
(1937) and the original memoir by Liapounoff (1901). 

(c) The sum of random variables whose distributions have not a finite second moment 
may not tend to normality. It will be seen in Chapter 9 that the mean of n variables each of 
which is distributed in the form 


dF = — 00 < X < 00 

1 -f 


is also distributed in that form, however large n may be. 

(rf) LiapounoflF has also given some remarkable results showing how close the limiting 
form is to the sum of n variables. In fact, if is the distribution function of the sum, 
and F that of the normal form 


\F,--F\ 


log 7b 


where c is a constant, is a function of the third moments of the constituent distributions. 


NOTES AND REFERENCES 

The logic of the theory of probability will be found dealt with in the books by Keynes 
(1921), F. P, Ramsey (1931) and Johnson (1921). All these take the axiomatic approach 
from probability as an undefined idea. The frequency approach has been discussed from the 
more logical angle by Veim (1888), whose book, though out of print and to some extent out of 
date, is still worth reading. 

The mathematical theory of probability has boon treated by Levy (1925), Jeffreys 
(1939) and Uspensky (1937), all three books excellent of their kind. Von Mises' approach is 
described in his book (1936) and an axiomatisation in a paper by Dorge (1934). See also 
Kendall (1941). 

For inverse probability and likelihood see the review by Kendall (1940). There are 
scores of })apers, mostly controversial in character, on this subject, but a beginning of 
a systematic reading may bo made with the papers by Fisher (1921, 1930), Neyman (1937), 
and the book by Jeffreys (1939). 

For the central limit theorem see Cramer (1937), and for an oxtemsion to the case when 
the variables are dependent, Benistein (1927). 

Bayes, T. (1763), “ An essay towards solving a problem in the doctrine of chances,’* PhiL 
Trans., 53, 370. 

Bernstein, S, (1927), “ Sur Fextension du tWorfeme limite du calcul des probabilites aux 
sommes de quantites dependantes,” Math. Ann., 97, 1. 

Borel, E. (editor) (1925 and later years), Traite du Calcul des Probabilites et de ses Apjdications, 
Gauthier-Villars, Paris. 

Cramer, H. (1937), Bandorn Variables and Probability Distributions, Cambridge University 
Press. 

Dorge, K. (1934), Eine Axiomatisierung der von Misesschen Wahrscheinlichkeitstheorie,” 
Jber. dtsch. 3Iat. Ver., 43, 39. 



184 


PROBABILITY AND LIKELIHOOD 


Fisher, R. A. (1921), “ On the mathematical foundations of theoretical statistics,*’ PhU. 
Tram., A, 222, 309. 

- (1930), “ Inverse ProbabiHty,” Proc. Camb. Phil Soc., 26, 528. 

Jeffireys, H. (1939), The Theory of Probability, Oxford University Press. 

Johnson, W. E. (1921-24), Logic (3 volumes), Cambridge University Press.' 

Kendall, M. Q. (1940), “ On the method of maxim u m likelihood,” J. Boy. Statist. Soc., 103, 
388. 

- (1941), “A theory of randomness,” Biometrika, 32, 1. 

KeTOes, J. M. (1921), A Treatise on Probability, Macmillan. 

Kolmogoroff, A. (1933), Qnmdbegriffe der Wahrscheinlichkeitsrechnung, Berlin. 
lAvy, P. (1925), Cahtd dea ProbabiliUs, Gauthier-Villars, Paris. 

liapounoiF, A. (1901), “ Nouvelle forme du theoreme sur la limite de probabilite,” Mim. 
Acad. Sci. St. Pitersbourg, 12, No. 5. 

von Mises, R. (1936), Wahracheinlichkeit, Statistik und Wahrheit, Springer, Berlin. (English 
translation, 1939, as Probability, Statistics and Truth. W. Hodge.) 

Neyman, J. (1937), “ Outline of a theory of statistical estimation ba.sed on the classical theory 
of probability,” Phil. Trans., A, 236, 333. 

Ramsey, F. P. (1931), The Foundations of Mathematics, Kegan Paul. 

Uspensky, J. V. (1937), Introduction to Mathematical Probability, McGraw-Hill, New York 
and London. 

Venn, J. A. (1888), The Logic of Chance, Macmillan. 


EXERCISES 


7.1. If each of an aggregate of N objects can possess or not possess any of n character¬ 
istics A, B, . . . K •, and if \ab . . . /) is the number of objects jjossessing A, B . , . F, 
show that the number of objects possessing at least one oi A, B, . . . K is 

E{a) - Z{ab) + E{abc) . . . -f- (- l)»"^Z(at . . . k). 


In each of a packet of cigarettes there is one of a set of cards numbered from 1 to n. 
If a number N of packets is bought at random, the population of packets is large and the 
numbers are equally frequent, show that the probability of getting a complete set of cards is 


1 



N 

+ . . . -f (- 1)” 



N 


7.2. Three points are taken at random on a circle. Show that the probability that 
they lie on the same semi-circle is |. (As.sume that in the limit elementary intervals of arc 
are equiprobable.) 

Explain the fallacy in the following argument: One pair of the points must 
lie on a semicircle terminating at one of them. The probability that the third point lies 
on this semicircle is J, which is therefore the required answer. 


7.3. A simple random sample of « values, Xi . . . z„, is drawn from the normal 
population 


dF 


a\/2n 


e<-^y 


dx. 



EXERCISES 


185 


Show that the Talue of m which maximises the likelihood of this event is 
' m = -£(x), 

which is therefore the “ best ” estimate of the mean of the population. • 

7.4. Show that if p is the probability of a zero in the Irregular Kollektiv the prob¬ 
ability that there will be r consecutive zeros in a set of» members chosen at random obeys 
the recurrence relation 

«»+i =«„ + (!- - p) 

and hence that 

— 1 f -j- f 

[A] , . , 


where 



CHAPTER 8 

RANDOM SAMPLING 


The, Sampling Problem 

8 . 1 . In the previous chapter we have referred incidentally to the sampling problem, 
which can be stated quite simply : given a sample from a population, to determine from it 
the properties of that population. We noted that only in exceptional cases is it possible to 
make assertions about the population with complete certainty, and that consequently it is 
necessary to fall back on statements of a less categorical kind expressible in terms of prob¬ 
ability. 


8.2. In order to be able to apply the theory of probability to this problem it is necessary 
that the sampling should be random. In actual practice we often meet with samples which 
are not random, having been chosen purposively for some reason or otlier. In such circum¬ 
stances it is not, as a rule, possible even to make precise statements in probability; and where 
a decision has to be taken one is forced to rely on subjective judgments of an unsatisfactory 
kind. No numerical estimate of the probabilities can be made. It is for this reason that 
random sampling becomes of primary importance in statistical investigations from sample 
to population. From this point onwards we shall deal only with random samples, and to 
avoid constant repetition shall leave it to be understood that where a sample ’’ or 
a sampling distribution ” is referred to, random conditions are assumed. 

8.3. It is useful to begin a discussion of random sampling by considering the types of 
parent population from which samples can be chosen. 

(а) In the first place, the population may be finite and existent, e.g. the population of 
human beings in Europe at a fixed point of time, or the ])opulation of apples on a given 
tree. A sampling process which extracts members one at a time from this population 
will evidently eventually exhaust the supply of members if contmiied long enough. Thus 
the sampling, though random, is not simple in the sense of 7.21, for the probability of 
a given member being chosen varies according to what has already been abstracted. 

We may, however, reduce this process to one of simple sampling by replacing tlie 
members after withdrawal. The population then remains tlie same at each trial. The two 
cases are sometimes distinguished as “ sampling without replacement and “ sampling with 
replacement ^ 

Furthermore, we may also in many cases regard the sampling as simple to an adequate 
approximation even when there is no replacement. If tlie population is large compared with 
the size of the sample, the abstraction of relatively few members will not materially affect the 
constitution of the remaining population, which may thus be regarded as approximately the 
same for subsequent samplings. ' 

(б) Sampling with replacement from a finite population may, in fact, be regarded os 
sampling from an infinite population, for the process will never exhaust the supply. We 
may, however, have to deal with a population which is infinite in rather a different sense, 
namely, that of a limiting form. We may, for example, wish to consider the probability of 
a sample from the positive integers or the real numbers from 0 to 1. The latter case presents 
itself in sampling from a continuous frequency-distribution w hich we must necessarily regard 
as infinite. 


186 



RANDOMIfESS IN SAMPLING 


187 


Thus, if we replace an observational distribution by a conceptual continuous mathe¬ 
matical distribution, we replace at the same time a finite population by an infinite population. 
The drawing of random samples from such a population is attended by the circumstances 
referred to in 7.13 and 7.29, namely, that the process to the limit must be taken into account, 

(c) Thirdly, the population may be purely hypothetical. Consider, for example, the 
throws of a die. We may picture the continual throwing as a sampling process drawing 
existent members from some non-existent population. In such cases what we are really 
doing is constructing by mental fiction an imaginary population round the sample. 

The concept of the hypothetical populatiqn is necessitated by ideas of frequency in 
probability. It is not required (and indeed has been explicitly rejected by Jeffreys) in the 
approach which takes probability as an undefinable measurement of attitudes of doubt. 
But if we take probability as a relative frequency, then to speak of the probability of a sample 
such as that given by throwing a die or growing wheat on a plot of soil, we must consider the 
sample against the background of a population. There are obvious logical difficulties in 
regarding such a sample as a selection—it is a selection without a choice—and still greater 
difficulties about supposing the selection to be random ; for to do so we must try to imagine 
that all the other members of the population, themselves imaginary, had an equal probability 
of assuming the mantle of reality, and that in some way the actual event was chosen to do so. 
This is, to me at all events, a most baffling conception. At the same time, it has to be 
admitted that certain events such as dice-throwing do happen as if the constituents were 
chosen at random from an existent population, and it accordingly seems that the concept of 
the hyi^othetical population can be justified empirically. 

Randomness in Sampling 

8.4. In its colloquial use the word random ” is applied to any method of choice 
which lacks aim or purpose. We speak of drawing names at random out of a hat, choosing 
plants at random from a field of corn, selecting family budgets at random from the popula¬ 
tion, meaning thereby that the selection is completely hapliazard. 

Now it is found in practice that choice bj^ a human being is not random in the stricter 
sense that it produces equally frequently events which we are entitled to expect to have 
equal prior probabilities. Some examples vvill make this clear. 

Example 8,1 

In the course of certain work at the Rothamsted Experimental Station sets of eight 
wheat plants were chosen for measurement. Six of these were chosen by approved methods, 

TABLE 8.1 


Distribution of Plants chosen haphazardly in Ranks 1 to 8, 
(F. Yates, Ann, Eugen, Land., 6, 202.) 





Kuinbers 

bearing Specified Rank. 



Date. 

Obsorvation. 

1 

2 

3 

4 

5 

6 

7 

8 

Total. 

May 31st 

Shoot height . 

9 

7 

11 

8 

11 

18 

21 

31 

110 

Juno 28th 

Kai heiglit 

9 ; 

19 

27 

23 

15 

1 

10 1 

1 

5 

1 

4 

112 



188 


EANDOM SAMPLING 


referred to below, and may be taken to be truly random. The othw two were chosen 
haphazardly by eye. If, in any set, the eight plants were ranged in order of magnitude, ^e 
two selected by eye could hare any number from one to eight; and if they, in common with 
the other six, were chosen at random, they should occupy these places with approximately 
equal frequency in a large number of sets. Table 8.1 shows what actually occurred on two 
different occasions (o) on May 31st, before the ears of wheat had formed, and (6) on June 28th, 
after tbe e^rs had formed. 

The divergence of actual from expected results is quite striking. On May 3 let, before 
the ears had formed, the observer was strongly biased towards the taller shoots ; whereas in 
June he was biased strongly towards the central plants and avoided short and tall plants. 

Thus it is seen that bias can appear even in a trained observer, and that the bias need 
not be consistent in over- or under-estimation in different circumstances. 

Example 8.2 

The following table shows the frequencies of final digits in a number of measurements 
made by four different observers;— 


TABLE 8.2 

liias in Scale Beading. Distribution of Final Digits in Measurements by 

Four Observers. 

(G. U. Yule, J.R. Statist. Soc., 90, 570 ) 


Final Digit. 

Frequency of Final Digit per 1000. 


A 

B 

C 

D 

0 

158 

122 

251 

358 

1 

97 

98 

37 

49 

2 

125 

98 

80 

90 

3 

73 

90 

72 

63 

4 

76 

100 

55 

37 

6 

71 

112 

222 

211 

6 

90 

98 

71 

62 

7 

56 

99 

75 

70 

8 

120 

101 

72 

44 

9 

129 

81 

65 

16 

Total i 

1001 

999 

1000 

1000 


It is hard to suppose that there was any genuine difference which would lead to the 
appearance of certain digits at the expense of others, and we may confidently suppose that 
the deviations from approximate equality indicate bias on the part of the observer. 

Observer A had decided preference for 0, 2, 8 and », avoiding the centre of the scale. 
Observer B is quite good, his deviations from expected values being small, though he also 
V showed some preference for 0. Observer C was poor, rounding off one measurement in two 
to the whole or half unit. Observer D was obviously very bad indeed, nearly 57 per cent, of 
his measurements being rounded off to the whole or half unit. 

The observations were aU made by reading a scale, those under A being on drawings to 



RANDOMNESS IN SAMPLING 


189 


the nearest tenth of a millimetre, those under B, C, and D being measurements on the heads 
of living subjects to the nearest millimetre. We may conclude from this that different 
observers may exhibit different degrees of bias even under comparable circumstances, and 
that even those who are aware of the existence of the possibility of bias and the necessity for 
taking great care (as observer A was) may nevertheless fad to avoid it. 

Example 8.3 

An observer was placed before a machine consisting of a circular disc divided into ten 
equal sections in which were inscribed the digits 0 to 9. The disc rotated at high speed and 
every now and then a flash occurred from a nearby electric lamp of siich short duration that 
the disc appeared at rest. The observer had to watch the disc and write down the number 
occurring in the division indicated by a fixed pointer. 

This was a machine designed for the provision of truly random numbers (see below, 8.10) 
and had been found by another observer to do so. But this particular observer produced 
a definite bias. The frequencies of digits in 10,000 run off by him are shown in Table 8.3. 


TABLE 8.3 

Distribution of Digits obtained by an Observer in using a Bandomising Machine. 
(Kendall and Babington Smith, Supp. JM. Statist* Soc.^ 6 , 51.) 


Digit. 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 

Total. 

Frequency 

1083 

i 

i 

865 

1053 

1 

884 

1 

1057 

1007 

i 

1081 

997 

1025 ; 

948 

10,000 


If the observer was unbiased the digits should appear in approximately equal numbers ; 
but there is a bias in favour of all the even numbers and against the odd numbers 1, 3 and 9. 
The cause of this bias is obscure, for the observer did not have to estimate (as in the previous 
example) but merely to write down something which he saw, or thought he saw\ The 
ex})lanation seemed to be that he had a strong number-preference, i.e. that he actually mis- 
saw the numbers, or that his brain controlled his ocular impressions and censored them. Wo 
have here to deal with one of the deadliest forms of bias in psychology. 

Example 8.4 

Every year a number of crop reporters in England an(i Wales estimate the prospective 
yields of certain crops, forecasts being obtained at different periods of the year and final 
estimates when the crop is harvested. Table 8.4 shows the average estimated yield of 
potatoes at the various times for the years 1929-1936. 

This table exhibits very clearly an effect which has shown itself in nearly all the English 
crop reports (and appears also i^ other countries), namely, the chronic pessimism of crop 
forecasts. In every case but one in the table the forecasts are below the final estimate. 
Nor do crop rejyorters seem able to learn by experience that they are underestimating. 
Nothing in this table indicates that the differences between forecast and final estimate 
diminished during the period concerned. 

It should also be noticed that these estimates are the weighted average of a large number 
of independent observations. One of the commoner misunderstandings in this type of work 





190 


EANDOM SAMPLING 


TABLE 8.4 


Bim in Crop Forecasting. Forecasts of Yields of PoUitoes in England 
and Wales (Tons per Acre). 

(From the oiScial agricultural statistics.) 


Year. 

Sept. 1st. 

Oct. Ist. 

Nov. Ist. 

Final 

Estimate. 

Yield. 

% Difference 
from Final. 

Yield. 

% Difference 
from Final. 

Yield. 

% Difference 
from Final. 

1929 

6-7 

- 17-4 

6-2 

- 101 

6-5 

- 5-8 

6-9 

1930 

6*0 

- 7-7 

61 

- 6-2 

61 

- 6-2 

6-5 

1931 

6-6 

00 

6-3 

- 3-6 

5*3 

- 3*6 

6*5 

1932 

6-4 

- 30 

6-2 

- 61 

6*3 

4-5 

6-6 

1933 

6*4 

- 4-5 

6-2 

- 7-5 

6-4 1 

-* 4'5 

6-7 

1934 

60 

- 15-6 

6-3 

- 11-3 

6-7 1 

- 5-6 

71 

1935 

5-6 

- 9-7 

5*7 

~ 8-1 

60 1 

- 3*2 

62 

1936 

60 

- 32 

1 

5-9 

- 4-8 

5-8 1 

- 6-5 

62 


is based on the supposition that, though individuals may make mistakes, their errors will 
cancel out in the aggregate. Our present example shows this to be untrue in general. 
There can appear a systematic bias afifecting all the individuals performing estimates. 

8.5. The foregoing examples are enough to indicate that human bias is very prevalent. 
Trained observers may be biased even when conscious of their own imperfections ; different 
observers may be biased in different wa^^s in similar circumstances : and the same observer 
may be biased in different ways in different circumstances. It is abundanlly clear that we 
must look for true randomness elsewhere than in mere lack of purpose on the part of human 
observers. There may be persons whose psychological processes are so finely balanced that 
they can deliberately select random samples, but few statisticians who have experimented in 
this interesting field would regard themselves as among them. 

8.6. In Chapter 7 we saw that the primary function of randomness in probal)ility was 
that it ensured that certain primitive events were equally probable. We may say that 
a method of selection is random for a population U if. when applied to U, it gives all members 
an equal probability of being chosen ; or, in the language of freciuency, if, when continually 
applied to U, it educes the members approximately equally frequently. 

But this is not enough. Suppose we had a population of two members A and B, and 
sampled with replacement. Then a method wliich chooses A and B alternately and produces 
the series ABAB . . . educes each member approximately equally frequently ; but it is 
not what we customarily mean by a random method. What w^e require of a random method 
is that in such circumstances it should produce a series like that of von Mises (7.15) in which 
no systematic arrangement is evident. Not only single characteristics, but all possible 
groups of characteristics should appear equally frequently. 

8.7. A further point is to be noted. We may, in drawing the sample, be interested in 
on© particular variate exhibited by the members, and it is possible that a method may give 
a satisfactory random sample So far as this variate is concerned without doing so for other 



191 


THE TECHNIQUE OF HANDOM SAMPLING 

variates* Suppose, for example, we are anxious to take a random sample from the 
inhabitants of a particular street. If we are concerned with a variate such as eye-colour it 
might be sufficient to choose a house every so often, say every tenth house, and take the 
inhabitants of that house as part of the sample. Such a met^iod would not give every 
inhabitant an equal chance of being chosen; but if we look back to the time when the 
inhabitants took up residence we may imagine that the colour of their eyes did not influence 
their geographical distribution, and thus that if we consider the allocation of the inhabitants 
in some way independent of eye-colour, and then take every tenth house, we may suppose 
that so far as eye-colour is concerned the sample is random. But the matter would stand 
difPerently if we were sampling for income. If for instance every tenth house was a corner 
house and thus inhabited by a person of more than average income, our sample would no 
longer be random with respect to income. Looking back, as before, to the time when 
inhabitants took up residence, we see that they can no longer be regarded as distributed at 
random, for those with larger incomes will tend to be attracted towards the more expensive 
houses. 

Thus a method which is random for one population may not be so for another ; and even 
in the same population a method random for one variate may not be so for another. 
Randomness is relative. 

The Technique of Random Sampling 

8.8. Suppose, then, that we are given a population and a variate is specified. How are 
we to draw a random sample, i.e. how can we find a method which is random for that popula¬ 
tion and that variate? The answer lies partly in theory and partly in practice. 

(a) In the first place we must require that there is no obvious connection between the 
method of selection and the properties under consideration. The method and the properties 
must be independent so far as our prior knowledge is concerned. In sampling a field of 
wheat for shoot height, for example, we must not use a method which could be influenced by 
that height, such as skimming a hoop over the field and selecting the plants round which it 
fell (for the hoop might tend to catch on the taller plants). Again, in sampling the 
inhabitants of a town by choosing names from a telephone directory we should undoubtedly 
tend to get the more well-to-do classes and hence, if the variate under consideration is wealth 
or any related characteristic such as number of children, political opinion, standard of 
educat ion and so on, the sample would not be random. If we were concerned with character¬ 
istics such as lieight, hair colour, or blood group the sample might be random, though it is 
not difficult in many similar cases to think of reasons why the variate might be linked with 
wealth. 

If this matter is viewed from the standj)oint of the axiomatic theory of probability the 
absence of knowledge about relationship between the method and the characteristic under 
consideration may be sufficient to ensure randomness, for the probabilities of elementary 
propositions then become equal -the probabilities being measures of prior attitudes of mind.* 
But if the frequency viewpoint is adopted it is not enough that there should be absence of 
knowledge of this kind, for unknowm to the observer there may be relations which will prevent 
the elementary prof)ositions from being true in approximately equal proportions. The 
presumption is that if we make as great an efi’ort as possible to ascertain whether any relation¬ 
ship exists and fail to find it, there is no relationship ; and hence we can assume randomness 

♦ At least, this is my interpretation of the position ; but the writers on the axiomatic theory have 
not discussed randomness at any length, being content to define it in terms of probability, and I may 
be putting a gloss on their views which they would not accept* 



192 


RANDOM SAMPLING 


with more or less oonfidenoe. But in this approach the assumption of randomness is 
ultimately part of the general uncertainty of the inference from sample to population, 

(6) Secondly, we may rely on previous experience of a random method to justify its use 
on new occasions* This is evidently an extrapolation, and though most people would regard 
it as reasonable, the fact has to be realised. The axiomatic theory of probability can embrace 
this extrapolation within its scope, for the probabilities given by the method are assessable 
in terms of prior knowledge ; but the frequency theory has to take the extrapolation as an 
additional assumption. 

8.9. One of the most reliable methods of drawing random samples consists of construct¬ 
ing a model of the population and sampling from the model. We may, for instance, note 
down the characteristics of each member on a card and sample by choosing cards from the 
pack corresponding to the whole population. This is the method adopted in lotteries and 
the process is known as lottery or ticket sampling. It is moderately effective but suffers in 
practice from two disadvantages: the labour of constructing the card population, and the 
danger of bias in the drawing of cards. Example 12.1 below, for instance, shows that the 
ordinary processes of shuffling and dealing playing-cards may fail to be satisfactory. To be 
reasonably satisfied about the randomness of the shuffling entails a good deal of trouble and 
labour, and the same object can be attained much more simply by the use of random sampling 
numbers, which we now consider. 

Random Sampling Numbers 

8 . 10 . The easiest way of constructing a miniature population is to attach an ordinal 
number to each member, mostly simply by numbering the members from 1 onwards. The 
set of ordinals so obtained is the miniature population and the problem of drawing a random 
sample reduces to finding a series of random numbers. The advantages of this method are 
obvious : no physical model population has to be constructed ; the numbering can be carried 
out in any convenient manner ; and the series of random numbers can be applied to any 
enumerable population so that any series of random numbers has a very wide i*aT)ge of 
application. 

One point should be made clear here. If the numbering of the population is carried out 
in such a way as to be independent of certain characteristics of the population, any set of 
numbers will serve to draw a sample random with respect to those characteristics. The 
randomness in such a case lies, so to speak, in the allocation of ordinals to the population, 
not in deciding which ordinals to select for the sample. But in practice a procedure of this 
kind is of no value, since it only throws back to the difficulty of numbering the population 
at random ”, The usual course is to number the population in any convenient way, related 
to the characteristics or not, and then seek for a set of numbers which are a random set from 
the possible ordinals of the poj)ulation. 

8.1 !• One of the more obvious ways of drawing random samples from an enumerated 
population is to use haphazard numbers taken from some totally unrelated source. Suppose, 
for instance, we wished to take a sample from the visible stars in the sky. We will ignore the 
small complications due to the existence of double stars and unresolved objects. Since the 
position of a star on the celestial sphere is defined by latitude and longitude, what is then 
required is a series of random pairs of latitudes and longitudes. At first sight it seems 
plausible to take an ordinary atlas and choose the figures set out in the index for place-names 
arranged alphabetically; for there is little reason to expect any relationship between the 



RANDOM SAMPLING NUMBERS 


198 


diatributlon of stams in the sky and the distribution of places on the Earth’s surface. A little 
reflection, however, will show that the method is unsound. There are large stretches of 
territory and sea on the Earth which have no place-names on them—the poles, deserts and 
oceans; consequently no numbers will occur for these regions and there will be corresponding 
areas on the celestial sphere which have no chance of being included. 

8.12* As a next attempt we might take a book containing a number of digits, e.g. 
a telephone directory, or a set of statistical tables or mathematical tables, open it at hazard 
and choose the digits which first strike the eye, or which occur at the top of the page, and so 
on. This is an improvement, but it is still open to some objection. 

(а) Telephone directories. Table 1.4 on page 6 shows the distribution of 10,000 digits 
taken from the London telephone directory. Pages were cho»sen by opening the directory 
haphazardly, numbers of less than four digits and numbers in heavy type were ignored ; and 
of the four-figure numbers remaining the two right-hand ones were taken for all numbers on 
the page. If the numbers were random we should expect about 1000 of each digit in the total 
of 10,000. Actually there are very considerable deviations from this expectation, and we shall 
see in a later chapter that they cannot be explained as sampling fluctuations. There are 
significant deficiencies in 5’s and 9’s, due to several causes such as the tendency to avoid 
these digits because they sound alike, the reservation of numbers ending in 99 for testing 
purposes by telephone engineers and so on. It is evident that tables of random numbers 
could not be constructed from directories such as this. 

(б) Mathematical tables. Evidently care has to be exercised in using mathematical 
tables in constructing random series. Suppose, for instance, we take a set of logarithm 
tables. There are clearly relationships between successive logarithms, expressible by the 
fact that differences are approximately constant if the interval is siuall. Moreover there is 
a very curious theorem about digits in certain classes of table which throws theoretical doubt 
on the method. Consider the logarithms to base 10 of the natural numbers from 1 onwards. 
Suppose we choose the i:th digit in each and so obtain a series of numbers 0-9. Then the 
proportional frequency of any digit in this series does not tend to a limit as the length of the 
serie.s increases, whatever k may be.* Just what does happen does not appear to be known, 
but it would seem that certain systematic effects begin to show themselves and these will 
obviously endanger the randomness of the series. 

(c) Statistical tables. If we have a volume of statistics such as populations of towns 
and rural districts there are some grounds for supposing that if the numbers are large—say 
four figures or more—the final digits will be random. Here again, however, the use of such 
tables requires care—they may have been compiled by an observer with number preferences, 
and some rounding up may have taken place. 

8.13. However, the necessity for the ordinary student to construct random series of 
his own has been obviated by the publication of various tables of Random Sampling Numbers. 
There are three such available ;— 

(а) Tippett’s numbers comprise 41,600 digits taken from census reports combined into 
fours to make 10,400 four-figure numbers {Tracts for Computers, No. 15). 

(б) Kendall and Babington Smith’s numbers comprise 100,000 digits grouped in twos 
and fours and in 100 separate thousands {Tracts for Computers, No. 24). These numbers were 

* Cf. J. Franel, Vierteljahrschrift der Natnrforachenden GeaellacJiaft in Zurich (1917), 62, 286, So 
groat a mathematician as Pomcai'6 mode a mistake on tins pomt. 

A.S.—VOL. I. 


O 



194 


RANDOM SAMPLING 


obtained from a machine specially constructed for the purpose on the lines very briefly 
described in Example 8.3. 

(c) Fisher and Yates’ numbers comprise 15,000 digits arranged in twos {Statistical Tables 
for Biological, AgriouUural and Medical Research). These numbers were obtained from the 
15th-19th digits in A. J. Thompson’s tables of logarithms and were subsequently adjusted, 
it having been found that there were too many sixes. 

Before considering the basis of these tables it may be helpful to give some examples of 
their use. Here are the first 200 of the Kendall-Babington Smith tables:— 

TABLE 8.6 


Random Sampling Numbers. 
(Tracts for Computers, No. 24.) 


23 

15 

76 

48 

59 

01 

83 

72 

69 

93 

76 

24 

97 

08 

86 

95 

23 

03 

67 

44 

06 

54 

65 

50 

43 

10 

53 

74 

35 

08 

90 

61 

18 

37 

44 

10 

96 

22 

13 

43 

14 

87 

16 

03 

50 

32 

40 

43 

62 

23 

60 

05 

10 

03 

22 

11 

54 

38 

08 

34 

38 

97 

67 

49 

61 

94 

05 

17 

58 

63 

78 

80 

59 

01 

94 

32 

42 

87 

16 

95 

97 

31 

26 

17 

18 

99 

76 

53 

08 

70 

94 

25 

12 

58 

41 

54 

88 

21 

06 

13 


Example 8.5 

To draw a sample of 10 men from the population of 8585 men of Table 1.7. 

The first process is to number the population ; and here, as in most similar cases, one 
numbering has already been provided by the frequency-distribution. We take numbers 
1 and 2 to be those in the group 67- inches, numbers 3 to 6 those in the group 58-, and so on, 
those in the group 77- inches being numbers 8584 and 8585. 

Now we take 10 four-figure numbers from the tables, e.g. reading across in Table 8,5 
we have 

2315, 7548, 5901, 8372, 6993, 7624, [9708], [8695], 2303, 6744, 0554, 5550. 

The two numbers in square brackets are greater than 8585 and we ignore them. We 
now select the individuals corresponding to the remaining 10 numbers. They will be found 
to be in the intervals 65-, 70-, 68-, 72-, 68-, 70-, 65-, 69-, 63-, 68- inches respectively. 

The mean of these values considered as located at the centres of intervals is 68-24, as 
against a value in the population of 67-46. 

Example 8.6 

To draw a sample of 12 from the population in the following bivariate table, showing 
the relation between inoculation and attack in cholera. 



Not Attacked. 

Attacked. 

Total. 

Inoculated .... 

276 

(0001-3312) 

3 

(3313-3348) 

279 

Kot inoculated 

473 

(3349-9024) 

06 

(9026- 9816) 

539 

Totals .... 

749 

j 69 

818 






KANDOM SAMPLING NUMBERS 


195 


There are now 818 members. We could, of course, take three-figure numbers from 
the tables, obtaining, e.g. from Table 8.5 

231, 676, 485, etc. 

But this is rather troublesome as the numbers are not grouped in threes. It is more con- 
Tenient to take four-figure numbers as before and to associate each member of the population 
with 12 numbers in the tables, e.g. the first would correspond to 0000-0011, the second 
to 0012-0023, and so on. We then get the numbers shown in brackets in the above 
table. Numbers above 9816 we ignore as before. 

The two numbers omitted in the previous example can now be used, and we find the 
following results:— 



Not Attacked. 

Attacked. 

Total. 

Inoculated .... 

3 

0 

3 

Not inoculated 

8 

1 

0 

Totals .... 

♦ 

11 

1 

12 


Here, for example, the member corresponding to the number 2315 falls in the not-attacked: 
inoculated class, and so on. 

It has so happened in this example that no member in the very small class inoculated : 
attacked class has been selected. Suppose we had had a series containing 

3314, 3323, 3333, 3341. 

All those fall into the group and there are four of them, as against only three members 
in the population. Had we been confronted with this position we should have had to 
decide whether the sampling was to be with or without replacement. If it was without 
replacement, we should have to suppose that the first three numbers in the group 3313-3348 
exhausted that part of the population and ignore all numbers of the group occurring sub¬ 
sequently. 

Example 8,7 

To construct a series of random permutations of the numbers 1 to 5. 

Here we are not concerned with the digits 0, 6, 7, 8 and 9 and so ignore them in the 
table of random numbers. We read through the table and not/e the digits as they occur, 
e.g. in Table 8.5 wo have 2315, 7548, etc. The 7 is to be ignored and also tho second 5, 
for one 5 has already occurred. We then reach the permutation 23154. Then we start 
again, the next series being 8, 5901, 8372, 5993, 7624, etc., giving the permutation 51324; 
and so on. 

Example 8,8 

1 

To take a random sample of 10 from the normal population dF = - dx. 

This is a particularly interesting case, for we have to select a sample from an infinite 
population. Such a process, as has been seen, can only be considered as a limiting one. 




106 


RANDOM SAMPLING 


Consider the frequencies of the normal curve in ranges Od on each side of the mean. 
These may be obtained very simply from tables of the normal integral by differencing 
and in fact are given in many tables of that integral, e.g. that of Appendix Table 2. Suppose 
the frequencies rounded up to four places of decimals, e.g. those near the mean would be 



0Q~ 

00398 



01- 

0 0394 



0-2- 

00387 



0-3- 

0 0376, etc. 


and the total frequencies 

are given by the normal integral itself, e.g. 


Upper Limit of 

Frequency up to i 

Upper Limit of 

Frequency up to 

Interval. 

that Limit. 

Interval. 

that Limit. 

00 

0-5000 

-- 0-1 

04602 

01 

0-5398 i 

' - 0-2 

0-4207 

0-2 

0-5793 I 

- 0 3 

0-3821 

0-3 

0-6179, etc. 

— 0-4 

0'3085, etc. 


We may now attach a four-figure random number to this population, which is finite and 
discontinuous : o.g. tlie number 5461 corresponds to a variate-value + 01- and the number 
3500 to - 0*4-. 

Had we taken the table to n places of decimals wo should have required n- figure 
numbers. Furtliermore, wo can make the approximation more exact by taking a finer 
variate interval. Such matters as this are to be decided in the liglit of the degree of approxi¬ 
mation required. 

8.14. Random Sampling Numbers must obey certain conditions before they can be 
used. Any set of numbers whatever is random in the sense that it might arise, with however 
great improbability, from random sampling ; but such a set might not be suitable as a 
table of Random Sampling Numbers. Prom the examples already given it is clear that 
we desire such a table to have very great flexibility. It should give random results in as 
many cases as possible, whether used in part or in whole. 

Now it is impossible to construct a table of Random Sampling Numbers which will 
satisfy this requirement euJrcly. Suppose, to take an extreme case, we constructed a 
table of 10^^^’® digits. The chance of any digit being a zero is and thus the chance that 
any given block of a million digits are all zeros is Such a set should therefore arise 

fairly often in the set of blocks of a million. If it did not, the whole set would 

not be satisfactory for certain sampling experiments. Clearly, however, the set of a million 
zeros is not suitable for drawing samples in an experiment requiring less than a million 
digits. 

Thus, it is to bo expected that in a table of Random Sampling Numbers there will 
occur patches which are not suitable for use by themselves. The unusual must be given 
a chance of occurring in its due proportion, however small. Kendall and Babington Smith 
attempted to deal with this problem by indicating the portions of their table (5 thousands 
out of 100) which it would be better to avoid in sampling experiments requiring fewer 
than 1000 digits. 

8.15. If a table of random numbers is used to draw members from a population 
of ten, we expect the members to api>ear in approximately equal proportions. In other 
words we expect such a table to contain the ten digits 0-9 in approximately equal pro- 



SAMPLING FROM ATTRIBUTES 


197 


pottions. Similarly we expect the hundred pairs 00-99 to appear in approximately equal 
proportions, and so on. Various tests of this kind, based on a comparison betwe^en actual 
frequencies and those required to satisfy the laws of probability, can be devised. No 
table can satisfy them all, but if it satisfies tests which (a) emure the randomness of the 
numbers for the commoner types of sampling inquiry for which it is likely to be used and 
(6) are capable of revealing any particular sort of bias to which the numbers are susceptible 
in virtue of their mode of formation, it is likely to be of general application. 

For a more detailed discussion of these matters and the results of tests on the Tippett 
tables, the Kendall-Babington Smith tables aqd the Fisher-Yates tables, reference may be 
made to the works listed at the end of the chapter. 

Sampling from a Continuous Population 

8*16. Random Sampling Numbers offer the best method known at the present time 
of drawing random samples from an enumerable universe, and as was seen in Example 8.6, 
may also be used to draw samples from a continuous population specified mathematically. 
But cases sometimes occur in which they cannot be employed. For instance, if we wish 
to take a sample of milk or flour, wo cannot in practice number each particle and extract 
it from the population for examination. In such cases we are usually compelled to fall 
back on more intuitively grounded procedure. To take a random sample from a milk 
churn, for example, we might stir the contents thoroughly and scoop up a sample hap¬ 
hazardly. Sometimes, when the population is of manageable size, wo can proceed system¬ 
atically by dividing it into a number of parcels and selecting parcels by the ordinary 
technique of random numbers. Most sciences have their own peculiar sampling problems 
and no attempt can be made here to discuss them all. At this point we leave the technique 
of random sampling and assume hereafter, unless the contrary is stated, that the material 
we are discussing has been obtained by a random process. 


Sampling from Attrilmtes 

8,17. As an introduction to the general sampling problems we shall consider the 
sampling of attributes, w^hich raises all the difficulties of principle but is not obscured by 
too much mathematics. 

Suppose we have a random sample from a population whose members all exhibit either 
an attribute A or its negative not-A, Our sample is n in number, and a proportion p, or 
a number pn, exhibit the attribute ; and consequently a proportion q, or number qn 
(p 1) do not. We will assume that the population is large, or that samiding is with 

replacement, so that the probability of obtaining an A at any drawing is not affected by 
other drawings and is therefore a constant, say w. 

The problems we have to consider are of three types:— 

(a) Suppose we have some reason for supposing that the proportion of A’s in the 
population is given by a known m. Does the observed proportion p bear out this hypothesis 
or is it so divergent from to as tb lead us to doubt the hypothesis ? In an experiment with 
plants exhibiting two strains of a quality such as height in pea plants, we may wish to 
test whether the breeding follows the simple Mendelian law of dominant and recessive. 
If we begin with two pure strains tall and short, cross-breed a first generation and then 
produce a second generation by interbreeding, the proportional frequencies of ** short 
and ** tall in this generation will be | and J if ‘‘ short ’’ is dominant and J and f if ‘‘ tall ** 
is dominant, provided that the simple Mendelian law holds. Suppose we carry out such 



1S8 


RANDOM SAMPLING 


an experiment and find that for 400 plants the frequencies are 70 and 330. Can the diver¬ 
gence from the theoretical values 100 and 300 have arisen by chance, or is it large enough 
to throw doubt on the hypothesis that the simple Mendelian law is operating ? 

(6) In the foregoing type of problem we have some reason for testing a value of w 
given a priori ; but we may know nothing of va, and in such a case our principal problem 
is to estimate it from the sample. 

(c) Then, having estimated it, we wish to know the degree of reliabUity of the estimate. 
How far is the estimate likely to deviate from the real value of cj ? 

8.18. Consider first of all the first type of problem, in which to is given a priori. If 
we choose samples of n from the population they will, on our supposition that the prob¬ 
ability of obtaining an A remains constant, be arrayed by th<i! binomial n {% -\- to)®, that 
is to say, the probability of obtaining pro A’s and qn not-A’s in a sample of n is the term in 

®*’“ Z®" (z + ro)®. i.e. where = 1 — ra. Thus the probability of obtaining 

pn or fewer A’s is the sum of the first pn -|- 1 terms in this binomial. 

If this probability is small we have the choice of three possibilities :— 

(o) An improbable event has occurred. 

(6) The hypothesis is not true, i.e. the proportion of A’s in the population is not m. 

(c) The sampling process is not random. 

We can usually exclude (c) by taking care with the sampling process, and merely have 
to balance (a) and (6). It is in general accordance with the notions discussed in the previous 
chapter that we dismiss a hypothesis which gives rise to an improbable event in favour 
of one which makes it more probable. Thus if the improbability is great we reject the 
hypothesis, i.e. we are led to doubt (with greater or less force, according to the degree of 
improbability) the supposition that the proportion of A’a in the population really is ro. 
Per contra, if the event is probable it casts no doubt on the hypothesis. 

Example 8.9 

In certain coin-tossing experiments a coin was tossed 20 times and came down heads 
16 times. Does this conflict with the hy]>othesis that the coin was unbiased ? 

The hypothesis we have to test here is that ro = J. The probability that in 20 tosses 
we should get 5 tails or fewer Ls the sum of the first 6 terms in (J J)*", and will be fourul 
from Table 6.2 to be 0 0207. Thus the probability of such an event is small—the odds 
are 60 to 1 against the event—and we suspect the hypothesis accordingly. If, on the 
other hand, we had supposed the value of ro to be 0-7 we should have found the first 
6 terms of (0'3 -f- 0-7)®® to be 0-4163, so that the event is no longer improbable, and we 
should not have rejected the hypothesis. 


8.19. In the example just given we purposely took a fairly low value of n in order 
that the terms of the binomial could be calculated directly. In practice n is often fairly 
large—100 or more—and the evaluation and summation of individual terms would be 
most tedious. We can, if complete accuracy is desired, use the method of summation 
given in 5,7, and evaluate from the incomplete R-function. But for all ordinary purposes 
it is quite enough to use the normal approximation to the binomial. We saw in Example 
4.6 that as m —CO the binomial -f ro)® tends to the normal distribution with mean nm 
and variance nmx. Probabilities can therefore be evaluated from the normal integral. 



STANDARD ERROR 


199 


In fact, for many purposes, it is not even necessary to carry out the actual evaluations. 
From the tables of the integral (Appendix Table 2) we note that the probability of a 
deviation as great or greater in absolute value than the standard deviation is 0-3173 ; than 
twice the s.d. is 0-0455 ; than thrice the s.d. is 0*0027 ; than four times the s.d. is 0-00006. 
Thus if we find np differs from nm by more than twice V utdx we begin to doubt the hypo¬ 
thesis, and if the difference is more than thrice V nmx we may confidently reject it. 

Similarly, if the proportion p differs from m by more than twice we begin to 

^ n 

doubt the hypothesis, and so on. It makes no difference whether wo compare the actual 
frequencies or the proportions. 

Example 8 JO 

In some dice-throwing experiments Weldon threw dice 49,152 times, and of these 
25,145 yielded a 4, 5, or 6. Is this consonant with the hypothesis that the dice were 
unbiased ? 

If the dice are unbiased the probability of a 4, 5 or 6 is Thus nm is 24,576 and the 
observed np is 569 in excess of this value. 

Vnwx = a/(I X i X 49,152) = 110-9. 

The observed deviation is more than 5 times this quantity and we accordingly suspect 
very strongly that the dice were biased. 


Sf/xndard Error 

8.20. The quantity \^(nmx) is a particular case, appropriate to the binomial, of an 
imj)ortant statistical concept known as the standard error. It is the standard deviation of 
the sampling distribution of the statistic np. It is particularly important in the class of 
cases, which is relatively largo, wherein that sampling distribution can be taken to be 
normal either exactly or to an adequate degree of approximation. 


8.21. Let us now turn to the case in which no value of w is given a priori. If the 
sample gives a proportion of ri’s equal to p, what shall wo take as our estimate of tn? 
The most obvious course is to take p itself; and this is the course dictated by the more 
sophisticated ideas described in Chapter 7. 

Consider first of all the method of maximum likelihood. The probabihty of obtaining 
np A'b and nq a’s is 


fn\ 

\7ip) 




( 8 . 1 ) 


This is proportional to the likelihood, and neglecting constants we have to maximise 

for variations of m. We have, since L is non-negative, that and ~ - (log L)( — 1 

dm do) ^ \ L dm) 

vanish together and it is therefore sufficient to maximise log L. We have 


dm 


(log L) 


np 

m 


nq 

I — m 


giving 


tu == p. 


(8.2) 



200 


EANDOM SAMPLING 


The method of Bayes will give the same result if we suppose the possible values of m 
equally distributed between 0 and 1 for dto —► 0. For then 

P(ro|p)-_V^^Z..(8.3) 

-{npy’X^ .... ( 8 . 4 ) 

which, as before, is maximised when m ^ p. 

There is another way of looking at this problem of estimation. Suppose we took a 
large number of samples from the population with a proportion of ra Ji’s and x not-^d’s. 
Our estimate of w would be p in each case, p varying from sample to sample ; and the mean 
value of all such estimates would be 

.( 8 . 6 ) 

=r w{m + 


SO that the mean value of our estimate over all possible samples is w. Such an estimate 
is called ttnbidsed —^if we follow the rule of estimation the average of our estimates in a 
large number of cases will be exactly the correct value m. It may thus be argued that the 
unbiased estimate should be taken as a reliable estimate of tu. 


8 . 22 . In this case, therefore, all the approaches lead to the same conclusion (a 
happy state of affairs which, as we shall see in the sequel, does not always exist). Consider 
now the next stage of the problem ; what is the reliability of the estimate ? In other 
words, how far is the estimate likely to differ from the true value ? 


We know that if the sample value p differs from m 



the probability of the 


difference becomes smaller as t increases. Thus, with an assigned degree of probability 
we can say that it is improbable that jj will differ from m by more than an assigned amount. 
But to specify this amount exactly wo require to know^ w ; and this is precisely the quantity 
we are trying to find. 

The problem can only be solved as an approximation. If n is large the standard 
error of p is of the order so that we may put 





n 




+ 


1 + - 


■’)}= 


Thus 



STANDARD ERROR 


201 


neglecting terms of order n 


-1 




1 ^( g - p ) 

2 pqni 


}• 


( 8 . 6 ) 


Thus for large n the standard error of w is approximately equal to we thus 

reach the fundamental result that in large samples of attributes the standard error may be 
calculated by using the estimates of the parameters under estimate instead of the (un¬ 
known) values of those parameters themselves. 


Example 8J1 

In a sample of 600, 240 are found to possess the attribute A, Thusp = 0»40, np = 240, 
\/{npq) ~ 12. We can thus regard it as somewhat improbable that 7iw differs from 240 
by more than twice this amount, 24, and highly improbable that it differs by more than 
We thus can say with some confidence that 7im lies in the range 240 ± 24 and with great 
confidence that it lies in the range 240 ± 36. 


8.23. We now turn to a general consideration of the problems of sampling which 
have been exemplified above. In the first place, let us note the role of the sampling dis¬ 
tribution in this branch of the subject. We construct from the observations some statistic t. 
The sampling distribution of this statistic will in general (but not always) depend on some 
parameters of the parent population. The probability of the observed t then permits the 
making of statements, by inverse probabilit}^ likelihood or otherwise, about these parameters, 
and thus we are enabled to draw inferences about the parcmt population. The sampling 
distribution is thus fundamental to the whole subject and several subsequent chapters 
will be devoted entirely to the methods of finding distributions when the parent is specified. 

If we wish to test some hypothesis about the parent which is expressible by the deter¬ 
mination of certain parameters a priori, the problem is fairly simple. Given the values 
of the parameters, we can determine from the sampling distribution the probability of the 
observed value of the statistic, and use this to assess the acceptability of the hypothesis. 
Com})lications can arise even here, however, for in general, several statistics can be compiled 
from the same sample, and they need not necessarily all lead to the same conclusion about 
the hypothesis ; for example, a sample might have a mean which throws doubt on the 
hypothesis and a variance which does not. We shall discuss this difficulty more fully in 
the second volume. 


8.24. When the parameters of the population are not given a priori, we have the 
double problem of estimating the parameters from the sample and assigning probable 
limits to the estimates so obtained. We have already touched on some of the principles 
of estimation and shall develop the topic more systematically in due course. When we 
have obtained an estimate—itself a statistic—we seek its sampling distribution and there¬ 
from can assign probable limits to the population value. A special class of cases arises 
when we can find a statistic whose sampling distribution depends on only one parameter 
of the population (as in the case of attributes). 


8.25, These latter types of problem permit of certain important approximations, 
namely in the case when the sample is large. We saw in Chapter 7 that under very general 
conditions the sum of n independent variables, distributed in whatever form, tends to 
normality as 7t tends to infinity. Now many of the ordinary statistics in current use can 



202 


RANDOM SAMPLING 


be expressed as the sum of variates, e.g. all the moments; and many others may also 
be shown tend to normality for large samples. Thus we may approximate— 

(а) By taking a statistic, oaloulated from the sample as if it were a population, to be 
the estimate of the corresponding parameter in that population, e.g. the variance of the 
sample may be taken as an estimate of the variance of the population. 

(б) By calculating the mean and variance of the sampling distribution by using, 
instead of the unknown parameter values, the statistic values calculated according to (a). 

(c) By assuming that the distribution is normal and hence determining probabilities 
from the normal integral with the aid of the sampUng mean and sampling variance (the 
latter being the square of the standard error). 

8.26. Just how large n must be for such approximations to be valid is not always 
easy to say. For some distributions, particularly that of the mean, quite a satisfactory 
approximation is given by low values of n, say n > 30. For others n has to be much higher 
before the approximation begins to give satisfactory results, e.g. for the product-moment 
correlation coefiScient (below, 14.5) even values as high as 500 are not good enough. 

8.27. In the following three chapters we discuss the approximate and accurate 
methods for determining sampling distributions. Chapter 9 deals with large samples 
and is thus devoted mainly to methods for determining standard errors. Chapter 10 deals 
with methods for determining sampling distributions exactly. Chapter 11 discusses 
methods of approximating to sampling distributions by finding their lower moments. 

NOTES AND REFERENCES 

For some interesting discussions of problems of sampling generally, see Jensen (1926), 
Bowley (1926), Hilton (1924), Kiser (1934), Yates (1935) and Neyman (1034). For a dis¬ 
cussion of random sampling, see Kendall and Babington Smith (1938 and 1939) and Kendall 
(1941). The various tables of random numbers are referred to in the Introduction. 

Bowley, A. L. (1926), “ Measurement of the Precision attained in Samiiling,” Jivll. Int. 
Stat. Inst., 22, premier livre. 

Hilton, John (1924), “ Enquiry by sample ; an experiment and its results,” Jonr. Roy. 
Statist. Soc., 87, 544. 

Jensen, A. (1926), “ Report oh the representative method in statistics,” Bull. Int. Stat. 
Inst., 22, premier lh*re. 

Kendall, M. G., and Babington Smith, B. (1938), “ Randomness and random sampling 
numbers,” Jour. Roy. Statist. Soc., 101, 147, and (1939) “Second pa|>er on 
random sampling numbers,” Sujjp. J.R. Statist. Sor., 6, 51. 

Kendall, M. G. (1941), “ A theory of randomness,” Biornetrika, 32, 1. 

Kiser, C. V. (1934), “ Pitfalls in sampling for population study,” Jour. Amer. Stat. A«.«oc,, 
29, 250. 

Neyman, J. (1934), “ On two dififerent aspects of the representative method,” Jour. Roy. 
Statist. Soc., 97, 558. 

Yates, F. (1935), “Some examples of biased santpling,” Ann. EugcM., Lcmd., 6, 202. 
Yule, G. Udny (1927), “ On reading a scale,” Jour. Roy. Statist. Soc., 90, 570. 

EXERCISES 

8.1. Of 10,000 babies bom in a particular country 5100 are male. Taking this to be 



EXERCISES 


203 


a random sample of the births in that country, show that it throws considerable doubt 
bn the hypothesis that the sexes are bom in equal proportions. 

Consider how far this conclusion would be modified if the sample consisted of 1000 
births, 510 of which were male. 


8.2. If the number of members of a population bearing an attribute A is relatively 
small, show that the standard error of the number of .d’s in the sample is the square root 
of that number. Show also that the number of .d’s in the sample is an unbiased and 
a maximum likelihood estimate of the parameter of the Poisson distribution expressing 
the distribution of the number of j4’s in large samples from the population. 

8.3. By considering the hypergeometric distribution, show that if samples of n are 
drawn from a finite population of N without replacement, and a proportion w of that 
population bear an attribute A, then the standard error of the proportion p in the sample is 



Show also that p is an unbiased estimate of in. 

8.4. (TchebychefiF’s inequality). Show that for any distribution 

r.(‘ 

and hence that for any member drawn at random 

P(| X — B(x) I >a\/// 2 ) < 

Show further that the variance of the sampling distribution of proportions bearing 
an attribute A in samples of n from a population of attributes is not greater than Hence 
the probability that an observed proportion p differs from the true proportion w by more 
than amount i; is not greater than 

(This gives us an exact result, no assumptions about the normality of the limiting 
form of the binomial or the use of estimates in calculating standard errors being involved. 
The limits are, however, much too wide.) 


8.5. If a proportion w has to be estimated from a simple random sample with 
proportion p, and if/is the prior probability of w, then the posterior probability of w is, 
according to Bayes’ theorem, proportional to 

Show that this is a maximum if 


1 3 / , p—ni 

_ -jL- 4- fi —- 

/ dm 07(1 - w) 


= 0 . 


Hence, in general, as n increases, the solution tends to to = p, whatever the prior prob¬ 
ability of to. In other words, the maximum likelihood estimate is an approximation to 
that given by Bayes’ theorem as n tends to infinity, even if Bayes’ postulate is not assumed. 



CHAPTER 9 

STANDARD ERRORS 


9.1. Towards the close of the lost chapter we discussed the estimation of statistical 
parameters from large samples and the type of judgment of their reliability which depends 
on the use of the standard error. It was remarked that, for large samples, an estimate 
of a parameter may be obtained by calculating from the sample values the value of the 
parameter in the sub-population composed by the sample; and it was established that 
for samples of n the standard error gives a valid measure of precision, provided that (a) the 
sampling distribution of the statistic under discussion approaches normality and (6) that n is 
large in the sense there defined. It was also pointed out that a sufficiently accurate estimate 
of standard errors involving parent parameters could be obtained by using as the parameter 
values the corresponding statistics from the sample itself. 

Since the majority of statistics in current use do tend to normality the theory of large 
samples is, in the main, devoted to the determination of standard errors. In this chapter 
we describe the principal methods available for the purpose, and incidentally derive formulae 
for the standard errors of the various statistics considered in previous chapters. To avoid 
the usual square roots associated with the standard error we shall write our results as 
sampling variances and covariances. Thus, for a statistic t we write the variance of its 
sampling distribution as var /. The covariance of the joint distribution of two statistics 
t and w, that is, the first product-moment of their joint sampling distribution, is written 
cov (^, u). We shall also consider the distributions in large samples of some statistics which 
do not tend to normality. 

9.2. By definition, the rth moment of a statistic /, that is the rth moment of its 

sampling distribution, is the mean value of f taken over all possible samples, and may be 
written E{f) (cf. 3.35). If the joint distribution of the variates Xx . . . from which 
t is calculated, is dF(xx • • • moment of t is the integral of t^dF (considei'ed 

as a function of the ar’s) over the domain of the r’s. In particular, if the sample is simple 
and random and the parent distribution is dF, we have 

E{n = r • . . r rdF(Xx) . . . dF{x„l 

J —00 J —oo 

We are particularly interested in this chapter in the first and second moments of t, 
that is, the mean and variance of its sampling distribution. It may be recalled that the 
mean value of a sum is the sum of the mean values and that, if the variables are independent, 
the meeui value of a product is the product of the mean values (3.36). These two results 
will be repeatedly required. 

Standard Errors of Moments 

9.3. In the first place we consider the standard errors of the wide class of statistics 
depending on the moments, including the mean, variance, the Pearson measures of skewness 
and knrtosis and the moments and cumulants themselves. 

The sampling distributions of moments tend to normality under very general conditions 
in virtue of the Central Limit Theorem. In fact, if the parent distribution is represented 

204 



STANDABI) ERROBS OF MOMENTS 


205 


hj fix) dXfthe distribution of the jth power of x, say y, is easUy seen to be \f{yi )y j dy 

I ) ^ 

tod the Jth moment is the sum of n independent variates, each of which is distributed in 
tkat form. 

It is not so obvious that functions of the moments such as bi and 6® (the sample values 
of the parameters /?i and /Jj) will tend to normality, and special investigations may have 
to be made for particular statistics. Even at the present time it is often assumed without 
proof that certain statistics tend to normality, the feeling apparently being (so far as any 
feeling uprises into consciousness) that as most statistics do tend to normality the onus 
is on an objector to prove that any particular statistic does not. This is very dangerous 
to accurate inferential reasoning and the point is one to be borne in mind wherever a standard 
error is used. 

On a similar point, it should also be remembered that some statistics tend to normality 
more rapidly than others, and a given n may be large for some purposes but not for others. 
So far as it is possible to generalise with safety, we can usually (but not always) assume 
values of n greater than 500 to be large ; values greater than 100 are often great enough 
to be large for our purposes ; values below 100 are suspect in many instances ; and values 
IxjIow 30 are very rarely large. 

In the following we shall adopt the usual convention in regard to the distinction of 
parameters and statistics by writing Greek letters to represent the former and Roman 
letters to represent the latter. We have, then; for the rth moment-statistic corre¬ 
sponding to the rth moment parameter 


rn. 






and for the mean-moment 


m- 


1 V*^ 




(9.1) 


. (9.2) 




9.4. Consider now the mean value of rn^. Since the x'% are independent we have 


E{m,) - - E{xr) 

'tif 

5= ••*.••• (9.3) 

The sampling variance of is, by definition, and thus assuming, as we do 

throughout, that the appropriate moments exist, 

var (wv) = — /v] 


I 


n 




the second summation extending over the n(n — 1) cases in which j 7 ^ 1 ; (permutations 



206 


STANDARD ERRORS 


ofj and li thus being allowed). Since the a:*8 are independent the mean value of the product 
is the product of the mean values, and thus 

var (n*;) = + n(n - ~ n'* 

tv 

= - J«r®). 

n 

This is an exact result. 

In a similar way, if we have two moments, their sampling covariance is given by 

cov (m^, w;) == E{(m:^ - - ft';)} 

'* = (w) - 

Tv Tb Tv fv 

= kf^q+r - . 

which reduces to (9.4) if 3 = r, as it must, for the first product-moment of two identical 
variables is their variance. 

The formulae for moments about the mean are not so simple, for the mean itself 
is subject to sampling fluctuations. We have, in fact, 

E(m,) = ~E {i:{x-m\y} .(9.6) 

Tl> 

Now putting r = 1 in (9.4) we find 

var (m'l) = ktH “ ^i*) 

. 

and thus the standard error of the mean is /J Consequently, if the distribution is 

anywhere near normality, nearly aU the values of m, will lie within a range of the true value 
of order «"*. To order n~* we may then, taking an origin at the mean of the parent 
population, neglect powers of m\ higher than the first, and we then obtain from ( 9 . 6 ) 

Einif) = --E{E{x’' — mix’’"*)} 

7h 

= - e [ ex ^ - r-rxZ’x^-*! 
n [ n J 

Now the second term in the expectation on the right will involve the moments and 

will vanish since we have chosen our origin at the mean of the parent distribution (^ = 0). 
We shall iShen have, to order «"* 


E[m^) = 


. (9.8) 



STANDARD ERRORS OF MOMENTS 207 

a result which is not, like (9.3), exact, but is an approximation to order To this 

order we have 

var (m,) == — ft^)* ' 

' = E{m/) — fir* 

+ ^{a:/ xr^) - x^^)] - , 1 * 1. 

71 J 

The expectations of other terms occurring in the squaring vanish, since they contain jui. 
The expectation of is of order-- and is thus to be neglected. The 

TV TV TV 

remaining terms give us 

var («t,) = - jw,* + rV. /»*,-! - 2rj«,_i ^,+ 1 ). . . (9.9) 

Similarly it appears that 

cov(m,. w,) = ^(/V+a - /^r/^a + t^r-i (h~i “ /^a+1 “ ?/^r+i /“a-i) • 

tv 

Example 9,1 

From (9.7) we have 

var (rn'i) . 

Now, for the height distribution of Table 1.7 we found (Examples 2.1 and 2.6) that 
m\ === 67*46 \/m 2 = 2*57. Suppose we regard this distribution as a simple random sample 
from the adult male inhabitants of the United Kingdom living at the time when the data 
were collected. What can we say about the mean of the population ? 

The standard error of the mean depends on fi^. This is an unknown quantity, but we 
may, in accordance with the general principles of large sample theory, use Wj instead. 
We then find 

2*57 

Standard error of m\ = /oVqk ~ 0*028 approximately. 

Of>oO 

Thus we can say that the population mean probably lies in the range of twice this amount 
on either side of the sample mean, i.e. in the range 67*46 ± 0*056, and very probably in 
thrice the range, i.e. 67*46 ± 0*084. Our estimate of the mean would almost certainly be 
less than a tenth of an inch hi error. 

Example 9,2 

From equation (9.9) with r == 4 we find 

var («i«) = ^(/is — fil — Sftt/Hf + 16^,/^|). 

TV 



208 


standabd errors 


In Chapter 11 we shall show how to obtain this result by other methods of a more ex«fc0t 
character and confirm that it is, in fact, exact to order 

Example 9.3 

To show that in samples from a symmetrical population the first product-moment 
between the mean and any mean-moment of even order vanishes to order 
We have, by definition 

cov K, TO,) = ~ 

the other terms vanishing, since they involve the unit power of if we take an origin at 
the mean of the parent population, 
n I 

^ - rfli/Xr-l)- 

Now if r is even, and being moments of odd order, will vanish for a sym- 
metrical population and hence 

cov {m\, m^) = 0. 

In the language of the theory of correlation (Chapter 14) the mean and the even moment 
about the mean are uncorrelated to order 


Standard Errors of Functions of Moments 

9*6. From the expressions we have just derived for the sampling variances and 
covariances of moments, approximate expressions can be obtained for the sampling variances 
of functions of the moments. Suppose is such a function. We have the functional 
relation in difi’erences 

Amz + • • • + 0(Am)^. , . . (9.11) 

5mi dni^ 


Now any variations in ni due to fluctuations of sampling are of order n “i. To our approxi¬ 
mation, therefore, we may neglect the terms of order (Jm)® in (9.11), and the variation 
is then seen to be a linear function of the variations Jm and is e({uivalent to an equation 
in differentials ; that is to say, since the m’s are distributed normally in the limit, so will 
tj} be. We have, from (9.11), 

E(4,) == 

Hence, measuring from the means of the m’s and we find, squaring and taking mean values, 

var (^) - £{(- 5 ^)’ var (»,)} + i:{^ co,r (™,, „.)| . . (9.12) 

the first summation extending over all the m’s appearing in and the second over all 
m’s such that j ^ k. 

Similarly, for two functions <;4i, we have 


cov (<jti, (^a) =;== E 




dm, dm 


var 







STANDAED ERRORS OF FUNCTIONS OF MOMENTS 


209 


Example 9.4 

To find the sampling variance of the fourth cumixlant. We have 

Kt — fit 3/<| 

dKt = dfti — 6fi^fit. 

Hence, squaring and taking mean values, 

var (Kt) «= var (pt) — Ufi^ cov {jM„ jU,) + 36j«§ var (fit). 

Making the appropriate substitutions from (9.9) and (9.10) we have 

var (tct) —\{fi» — fi\ — 12/f,(/i, — fitfit — 4fi\) + ^Qf4(fii — fA)} 

Tit 


= -{/<8 — — fA + + ^^fA(^% — 36 ^ 1 ). 

Tt 

For a normal parent, /^4 == 3o^, = 15o^, = 105cy* and we have 

24 

var (K 4 ) == — (T*. 
n 


Example 9.5 

To find the sampling variance of the coefficient of variation 

j 7 __ lOOVm* 

Tn\ 

Taking logarithms and then differentials we have 

dF _ dm^ dm\ 

V 2mi m'l * 

Whence, squaring and taking mean values, 


1 , ., , var mi 

-cov (m,mi) + ' 


I//^ 4 - A 
n\ 4m| 


var V _ var m* 

To our order of approximation we may write == m^ and find 

vM I' - n ftiszA - JL% + 4-Y 

n \ 4fil fitfii fi^y 

For a normal parent this gives (//. — 3//!, /<s = 0); 

var 


F = LYi + M Il(i + 

^ \2 ^17 2n\ ^ 10V 


F* 

= — approximately. 

mTI 




9,7. On the above principles the standard errors of the more usual functions of 
moments, such as the measures of skewness and kurtosis, have been worked out and 
tabulated (see Tables for Statisticians and Biomeiricians and the references at the end of 
the chapter). 

In applications of results derived by the foregoing methods a few points are to be noted : 
(a) Tlie sampling variances are to be used only when the statistic under consideration 
is calculated from the moments. For instance, if the standard deviation of a normal 


curve is estimated by taking 



times the mean deviation of the sample, instead of 


A.S.—VOL. I. 


p 



210 


STANDARD ERRORS 


the more usual root>mean>square, the formula vax (a) 


(/t« - 


derivable from (9,9) is 


not applicable (see below, 9.11). 

(6) Prom (9.4) and (9.9) it will be seen that the sampling variance of a moment depends 
on the moment of twice the order, i.e. becomes very large for higher moments, even when 
n is large. This is the reason why such moments have very limited practical application. 

(c) Some measures calculated from the moments tend to normality very slowly. 
\/bi or bi (the sample values of or /Sx) are cases in point, and more refined methods 
which we discuss in Chapter 11 are preferable to the use of the standard error. 

(d) The order of the approximation makes it necessary to exercise care in the neigh¬ 
bourhood of vanisliing values of standard errors. For instance, if the coefficient of variation 
F = 0 in a sample, the formula of Example 9.5 would give var V = 0. But it does not, 
of course, follow that there is no variation at all in the population, though none exists in 

the sample and the presence of variation in the parent will be unlikely if the sample is at 

pa 

all large. When F = 0 the quantities neglected in our approximation giving var F = — 


become of some relative importance, though they are still small. 

(e) It is intf^resting to compare the sampling fluctuations, as expressed in the sampling 
variance, with Sheppard’s corrections to the moments. Writing temporarily sf for the 
uncorrected variance in the sample, for the corrected variance, we have 


_ 1 

^ 12 sf' 

where h is the interval width. For many practical cases, if d is the number of intervals, 
dh is about equal to and thus 


8 

8 


2 

2 

2 

1 


1 


3 


?* = 1 — -^-approximately. 


For a normal population we have 

a = V/^2 


and hence 


da -dfi^ 

2V/M, 


var a = — var 
4//2 


£? ==: ^ 
2n, 2n 


Thus if n is, say, 1000, the standard error of a is about 0 0224 a = 2*24 per cent, of a. 
Sheppard’s correction in a case where d 20 is only 0*375 per cent, of Si, i.e. only about 
a sixth of the standard error. It is as well to make the corrections, even when n is smaller 
than 1000, in order to avoid systematic error: but the correction should not be mis¬ 
interpreted as implying a higher degree of reliability in the corrected value than actually 
exists. 

Similar considerations apply a fortiori to the higher moments. 



STANDARD ERRORS OF BIVARIATE MOMENTS 


211 


Standard Error of Bivariate Momenta 

9.8. Extensions of the above fonnulae to the bivariate case are made without difficulty, 
only slightly more complicated algebra being involved. The reader will be able to verify 
the following formulae for himself: 

var (ot;_ ,) = 20 - ^'r\) . 

cov (w'^ „ m'„^f) — ^(/4+«,»+» — /4,s /4.»).(9.16) 

TC' 

var (m, J == — l*\.s + ^V 2.0 

ft 

+ g_x - 2r/<,,i.,/y,_i.g - 2,s;<^ g_i) . (9.16) 

cov (m,, „ W„, ,) = ^/U,+„, *+e -- ,«r, * + '■“/“*» * 

Tt 

+ fir, 8-^1 f^u, v-l + 1 /^r-1, . /Vr-1, 

+ 1 flf, V """ '^Mr+l, 8 flu-l, 0 

8+1 f^u, u—1 8 flu H. V *7^r, s—1 A^v, v+l) • (9*17) 


Example 9M 

The coefficient of correlation is defined by 

Wi 1 

“X/ (wi. 20 ^^ 02 ) 


We have 
Thus 


dr 

r 


dnti 


m. 


h' 


dm^ 

mac 


jdmoj 

^ moa' 


1 .. var (twn) , ivar (mjo) cov {m„, mjo) , . , 

var (r) = — \ + }- ; - ~ -+ similar terms, 

^2 /VVl 2 * /VM AV.. ' 




w|o 


^11^20 


from which, substituting appropriate values from (9.16) and (9.17) and writing for 
in the result, we have 


var (r) 


1^^-- 4* J— 


fhi 


j^\ 

iiifi^i/ 


n\fl\^ /^20 /^02 fl’iOflo2 fluflzs Pi 

p being the same function of the ju’s as r is of the m’s. For the bivariate normal distribution 
the substitution of values of Example 3.15 gives 

var (r) = - (1 — p*)*. 

The use of the standard error to test the significance of the correlation coefficient is 
not, however, to be recommended. 


Standard Errors of Qvantiles 

9.9. Among the various quantities measuring location and dispersion which wo 
considered in Chapter 2 there was one group, namely the quantiles, which are not algebraic 
functions of the observations and whose sampling variances cannot accordingly be deter¬ 
mined by the above methods. We proceed to consider them now. 

Suppose the parent distribution is represented by dF{x) = f{x) dx. The probability that, 
of a sample of «, (I — 1) fall below a value a-,, one falls in the range Xi db i dx, and the 
remaining {n — 1) fall above x, is proportional to 

fix,) dx,{l - F(xO}’-* = Fx'-Ml - Fx)"-' dF„ . 


. (9.18) 




212 


STANDARD ERRORS 


where Ft, «= F(Xi). This expression is accordingly the distribution function of * 1 , the 
member of the sample below which a proportion — of the members fall, i.e. the Zth quantile. 


n 


Put 

so that 


I =nq 

n — I = »(1 — }) 

= np, say. 

The distribution (9.18) has a modal value given by differentiating the frequency function with 
respect to i.e. (taking logarithms first) by 




i \fl 
~Fx) 


0 


. (9.19) 


r 

this equation being satisfied by the modal value £. Now for large n, the factor — will in 

general be small compared with the other terms in (9.19), I and n — I being large. We may 
therefore neglect it, and (9.19) becomes, to order n~^, 

P 


Q 

F 


0 


1 -F 

or F{$) = q. 

This is in accordance with our general assumptions. To order the quantile of the sample 
is the quantile of the parent. 

Now let us investigate the distribution (9.19) in the neighbourhood of the modal value. 

Put 

-P’x = ? + f. 

(9.18) becomes (neglecting constants) 

(? + mp - f)”'’. 

Taking logarithms and expanding we have, except for constants. 


nq log (l + ^ + np log ^1 - ^ 


== —. 4 - terms of order and higher degree in f, 

Now for large samples f will be small compared with q, and we neglect the terms of higher 
order. Thus the distribution of | is 


dF oc exp 


\ '^pq } 




or, evaluating the necessary constant by integration, 

di’ =-L_.__ exp 


V 2;r 


y(?) 


(=SY' ■ 


showing that { is in the limit distributed normally with variance 

Pi 




. (9.20) 


lr\ AV \ 





STANDARD ERRORS OP QUANTILES 218 

This ie the variance of f, which is a proportion. To find the variance of *1 we note that 
df as dFi a« /idar, and hence that 

var (Xi) = .(9.22) 

m 

In practioe this formula is often applied to grouped frequency-distributions, and in such 
applications it is to be remembered that/i, the ordinate of the parent, is to be taken as the 
frequency per unit interval at Xi, this being the best estimate of the ordinate. 


Example 9.7 

If Xi is the median, p — q = \ and we have var (median) 


1 

4< 


where/i is the median ordinate. For instance, if the parent population is normal, the median 
ordinate is (from Appendix Table 1) ^ 0-39894, a* being the variance of the parent. Hence 
the standard error of the median is 


Va 2 X 0-39894 
= 1-2533 ° 


V»' 

The standard error of the mean in samples of n from a normal population is which 
is thus considerably smaller than the standard error of the median. 


9.10. To find the covariance of two quantiles we generalise equation (9.18). K we 
have a random sample of « individuals the probability that (1 — 1) lie below x,, one lies at 
X, 4: Jdxi, {n — I — m) lie between Xi and x„ one at x, ± and the remaining (m — 1) 
above Xj is 

dF oc - P,)»-'-’»(l - dF .(9.23) 

where P, = P(xi), P, = P(x,). 

We put I = qin 

m = PjW 

and find, for the equations giving the modal values corresponding to (9.19), 

qi _ ^ 0 

Pi P. - P, 


r = 0 

F»-F[ 1 - P. 


giving, for the limiting modal values, 


P(^i) = qi 
F{£t) = g, 


. (9.24) 


The conditions as to the relative smallness satisfied in any ordinary case. Now 


put 


Pi = Si + fi 

Pt "I" fi* 



214 


STANDARD ERRORS 


The joint distribution of fi and St then becomes 

dF cc iq, + -qi+St- - St) ^'^dS^ dSt. 

On proceeding as in the previous section, taking logarithms, expanding and neglecting terms 
in and higher, we find ultimately 

dJ^ocexp - 2 ^xf.. . (9.26) 

Thus the joint distribution of Si and St tends to the bivariate normal form, and on comparing 
(9.26) with the canonical form (Example 3.16) we see that 

_1_ ^ «g. 

(1 - p*) var (Si) (git ~ qi)qi 

_1__ _ 

(1 - p») var (St) (gf- qfpt 

_ _ _ P _ ” 

(1 - p*) cov (fx^.) 

whence it is easy to find 


The a83Tnmetry of the result for the covariance is due to the fact that pt relates necessarily 
to the upper quantile. For the corresponding expression in .Tj and jCt we have 

cov (Xi, Xt) = - .(9.27) 

zj X 

With equations (9.26) and (9,27) we can find expressions for the variances of the quantile 
range and similar .statistics. 


(I - p*) V (var (Si) var (|,)} (g, - ?i)’ 


var(fx) 

71 

var (fa) = 

n 

cov (Si, St) = 

n 


. (9.26) 


Example 9.8 

The variance of the difference 8 of two quantiles at Xi and Xj is given by 


{(d = dxi — dxt, 

var (6) = var (Xi) + var (j,) — 2 cov (xi, x,) 

= i/M‘ 4- 

ft fl ff J ■ 

When the quantiles are the two quartiles, Pt — qi = i, Pi — qt = h and for the variance 
of the seTOi-interquartile range we have 


var (s.i.q.) 


J_f3, ^ 

<>4KV/f fi U 

where/i,/a, are the frequencies per unit interval at the two quartiles,relating to the upper 
quartile. As /i, /a have to be estimated from the sample, wo may also write 




var (s.i.q.) 


oV3 


+ • 


64w\gr* g\ 


— 1 
«i5./ 


where gx. g» are the actual sample frequencies at the quartiles and a* is the sample variance. 



STAIJDARD ERROR OP THE MEAN DEVIATION 


215 


For itwtanoe, if the parent distribution is normal, = gr, and we find 


var (s.i.q.) 


cr® 


From the tables the deviate corresponding to the quartile is 0*6745 and the ordinate at this 
point, Qi » 0*3178, so that the standard error of the semi*interquartile range is 


a 

4“x 0-3T78 


0*7867 


^/n 


9.11 • In amplification of the point mentioned in 9.7 (a) it is worth while stressing 
again the fact that a standard error is related to the way in which a parameter is estimated. 
For instance, the standard deviation of a normal curve can be estimated from a sample in 

several ways : from the second moment; by taking 

1 


~ j times the mean deviation ; by 


taking 


0.6745 


times the semi-interquartile range ; and so on. Each method will have 


its appropriate standard error, that for the first, for example, being 


V(2n) 


, and that for 


the third . At a later stage considerations such as this will lead us to the inquiry, 

what is the estimate, if any, with the minimum sampling variance ? For present purposes 
it is enough to note the imj)ortance of not using a quoted formula without reference to the 
method of estimation of the parameter concerned. 


9.12. The methods we have developed provide the standard errors for large samples 
of most of the measures of location and dispersion and the measures introduced in Chapters 2 
and 3. There remain a few on which we have not yet touched, viz. the mean deviation, 
Gini’s coefficient of mean difference, and the range. Wo consider them briefly in turn. 


Standard Error of the Mean Deviation 

9.13. The mean deviation, as was pointed out in Chapter 2 , is relatively speaking 
a complicated function, and the mathematical difficulties attendant on absolute values are 
well illustrated in discussions of its sampling variance. In fact, no general discussion of the 
sampling distribution appears to have been undertaken. The following exact value of the 
sampling variance in samples from a normal population was discovered by Helmert in 1876 
and rediscovered by Fisher in 1920. 

var (m.d.) = - + -v/{n(n — 2 )} — » + sin~*—L—. 

n* \2 n — I 

cr®/ 2\ 

~ M ~ j for large n. .(9.28) 

The proof follows the general methods described in the next chapter. It is quoted here 
for the sake of completeness.* 

♦ The distribution function of the mean deviation in normal samples from 2 to 10 lias been tabulated 
by Godwin and Hartley in Biometrika (1945), 33, 254. 



216 


STANDARD ERRORS 




Standard Error of tdie Mean Difference 

9.14. Nair (1936) has given a general expression for the standard error of Gini’s mean 
difference without repetition. In the manner of 2.24 it is easy to see that the co¬ 
efficient may be written 

• * • • 

n 

where U = 


V - 2 :»<. 

and we write n in preference to N for the number of observations, since we are dealing 
with a sample. 

In our usual notation, the probability that the Jth observation in order of magnitude 
in a series of n observations has value in the range x i idx is 

dF = - -F,^-H1 - F,T-^dF,. 

(j - i)'(w -jy. 

Hence the mean value of U is given by 


Similarly 

Thus 


^nf F^-\l-Fr-> 

J_oo I (j - l)!(w -j)'- 


+ (n- 1)F Err-^~„_^^Fi-Hl - F)»-j 


(j - 2)!(n -jy 


= nr xdF{I + [n - l)F} 
J —00 

E{V) =nja 


E(A,) 


\xdF. 

2 


(9.30) 

(9.31) 


(2E{U) - (nl)E{V)) 


n{n — 1) 

= 2^x{2F - 1) dF. 

In the same way (but we omit the details) Nair finds 


{/j -f- 2(w 2)/,} . 


. (9.32) 




STANDARD ERROR OF THE MEAN DIFFERENCE 217 

where 

/, = r «*{(» - 1) - 4(n - 2)F + 4(n - 2)F*)dF 

J —00 

J, *= r ar.dF.p dFy{{n - 3) - 2{w - 6)Fx - 2(» - 1)F, + 4(n - Z)F^ F.> 

J—00^ J-OO 

and finally 

var (di) - Wi)}*.(9.33) 

For three particular cases these integrals are worked out, giving: 

Normal Parent: 

1 r£! 

dF = —— e 2 ^* dx, — 00 < a? < 00 

aV27z 

<^—(0-8068)*.(9.36) 

72r 

Expmential Parent: 

1 zi£ 

dF = - e da;, 0 < a; < oo 

(7 

E{A^) = (7.(9.36) 

.'»•=’> 

-a”’-. 

Rectangular Parent: 

dF = 7 da: 0<a:<i: 

k 

= lk .(9.39) 

-Va4.“i-,.'»•'*«> 

~4a. 


9.15. We now turn to consider some statistics which are peculiar in more ways than 
one—the extreme values of a sample (or, more generally, the wth value from the top or the 
bottom of a sample) and the range. One of the unusual features of the distribution of wth 
values is that as n increases it diverges more and more from normality ; and it seems doubtful 
whether the distribution of range tends to any limit at all—certainly it does not tend to 
normality in all cases. 

A further difference between the quantities we are now considering and the others we have 





218 


STANDAED EEE0E8 


already discussed is that mth values and range in the sample are not used to estimate mth 
values and range in the population. In fact most of the results we shall obtain relate to 
parents which have an infinite range. What, then, is the use of these statistics ? The 
answer is that they may provide an estimate of parent parameters which do exist. For 
instance, an estimate of the variance of a normal population is given by dividing the sample 
range w by a constant depending on the number in the sample. This estiinate, though 
not so accurate as some (in the sense that its sampling variance is not so small), is extremely 
easy to calculate and is often useful. We wish, therefore, to know its sampling variance, 
that is to say the sampling variance of the range. 

Distribution of mth Values 

9.16. We consider first of all the distribution of wth values from the top, that for 
mth values from the bottom being similar. In particular, m may be unity, in which case we 
get the greatest member of a sample. 

Quantiles are special cases of this class of statistic, the ratio m/n remaining finite as n 
tends to infinity. In the case we now discuss m remains finite, so that the ratio m/n tends to 
zero. 

The distribution of mth values from the top is, as in equation (9.18), 

dF oc dF,. .... (9.42) 

When the form of Fi is known, this equation is sometimes capable of exact solution, as in the 
following example. 

Example 9.9 

Consider the rectangular distribution dF ^dx, 0 <,x <1. Here F{x) = x and the 
distribution of the mth value from the top is 

dF oc - xT^^ dx, 

the Pearson Type I curve. We have, for the first moment, 

. n — m + \ - m 

a. =-- 1 —-- 

^ 71 + 1 n + 1 

and for the variance 

m(n — m — 1) 

~ (n +'l>(rr + '2)’ 

but this sampling variance cannot be used in the ordinary way if m is finite, for the curve 
does not tend to normality. However, we may easily obtain exact values for the probabili> 
ties associated with the mth values, from the integrals of the Type I curve. In fact, the 
probability that a given value will not bo attained or exceeded is, in tlie usual notation, 
I Jin — m + 1, m). 

9.17. From this point our discussion of the limiting form of (9.42) as n tends to infinity 
is confined to the case wherein the parent population is a continuous frequency-distribution 
of unlimited range of the exponential type, i.e. such that it tends to zero with large x as fast 

d 

as or faster than dF — and that exceeds some fixed number as x 

tends to infinity. The normal curve obeys this criterion, wliich implies, among other things, 
that all moments exist. 



DISTRIBUTION OF mTH VALUES 


219 


For the mode of (9.42) we hare, as in (9.19), 

(ra - m)4 - = 0. 

For large n and finite m the mode Xi will be a large yalue, and both/i and 1 — Fx tend to zero. 
Accordingly we may put 


h d 


(1 - Fx) 


1 -Fx 


in accordance with the rule known as L’Hopital’s. Hence 

(n - m)^ - ^ ® 

F,(f) = 1 - 

n 

Now expand Fi in the neighbourhood of x by Taylor’s theorem. We get 

F{x) - Fm + +... 

_ 1 +. .. 

n ?i' W' ' n ’ 

—P —P 

(the last term in virtue of A ~ iZTF" “ ~tn^ ™ neighbourhood of the mode) 


1 — — exp ^ — (a; — '£)—/{£) Upproxiraately 


. (9.43) 
. (9.44) 


The distribution of the mth value from the top may be written, from (9.42), 


/I \m-l 

dF„oc^i-lj d(F»). 


To our approximation, from (9.44), since is small, 


G-r=(,4,-‘)' 


d{F^) == 1 — - 




Thus rfF„, oc exp (— — me *'»■) dy„ 

and, on evaluating the constants by integration, we get 




. (9.46) 



220 


STANDARD ERRORS 


The new variable is defined in terms of a; — ^ by 

= {* - 

In a similar way, for the mth value from the bottom we find 

ffl^ 

dJ* —^jOxp (in„^ — me**') dy, . . . (9.46) 

^ being written for the variable defined by 

F = -e-*'. 
n 

In particular, for the extremes (m =1) we have 

dF — exp {—y — e”*') d^ (top value) .... (9.47) 
dF = exp (y — c*') dy (bottom value) .... (9.48) 

9.18. These unusual limiting forms, which are due to Gumbel (1934), the extreme cases 
being due to Fisher and Tippett (1928), are very far from normal for moderate or low values 
of m. For the moments of (9.45) we have (omitting the suffix of y for convenience) 

Put e~^ = —. We get 

m 

( ^' - ' T liJo ^ ~ C-' dl 

= - log m + ^{log r (m)} 

m-1 .-V 

= _ log w + y - 2Jlr) . 

where y is Euler's constant. For the rth moment about the mean we have 

A*r = 7 C, f (log « - log Wl + g~t 

(m — AjlJo 

These formulae have been worked out further by Gumbel, from whose numerical results the 
following are chosen :— 


m 

Mean 

Pi 

^2-3 

1 

0-577 

M39 

2-400 

3 

0176 

0-621 

0-763 

5 

0103 

0-468 

0-437 

10 

0061 

0-324 

0-212 


These figures, which relate to the distribution from the top, show clearly that the limiting 
distribution is far from normal. The distribution from the bottom is similar, odd moments 
including the mean having the same magnitude but opposite sign, even moments being the 
same* 



DISTBIBUnON OF mTH VALUES 


221 


Moreover, ilie Umitiog forms (9.46) and (9.46) Me reached extremely slowly. Fisher and 
Tippett (1928) have shown in the case m = 1 that they do not provide a very satisfactory 
approximation for values of n less than 10^*. For practical purposes, therefore, there is still 
no adequate general approximate form for the distribution of mth values. 


9.19. The case m — I, corresponding to the extremes of the sample, has, however, 
been studied in more detail. In this case equation (9.42) becomes 

dF ==~F^^dXy. 

wX\ * 

By using the published tables of the normal integral Fx, Tippett (1925) has evaluated F for 
values of w up to 1000, and given diagrams yielding the variances, and /9i and /J,, which are 
reproduced in Tables for Statisticians and Biometricians, Part II. The following values are 
quoted from his results :— 


n 

Mean 

Standard Deviation 

A 

fit 

2 

0-564 

0*826 

0-019 

3*062 

6 

1163 

0-669 

0-092 

3 202 

10 

1-539 

0-587 

0-168 

3-331 

100 

2-608 

0-429 

0-429 

3-766 

600 

3-037 

0-370 

0-570 

4-003 

1000 

3-241 

0-361 

0-618 

4-088 


The values of fix and fi^ illustrate the point that as n increases, the distribution of the extreme 
value diverges more and more from the normal form. 

The limiting values as t? oo can be derived by the use of characteristic functions. 
In fact, we have for the distribution of the top value, 

<l>{t) == f exp (— a: — e"”®) dx^ 

J —00 

which, on substituting = f, gives 

4>{t) - j” 


Hence 


Thus 


whence 


= r(i - it). 

Xi{H) + + . . . = log == log r(l - it) 

= yW + + ^(if)^ . . . etc.* 

Ky = = y 

Ky = fiy =r Sy = — = 1-644934 
6 

fiy = 2Sy = 2-404114 

fly - 3//| =- 6*S« = - = 6-493939 

16 

§y = 1-299 
fi, - 5-4. 


♦ Cf. Edwards, Integral CcUctUvis, voL 2, article 916. 


«> j 

S, here is 



STANDARD ERRORS 


These are evidently far from equal to the values for « = 1000 givan above. Clearly the 
limiting form is an inadequate approximation for values of n much higher than 1000. 


9 ,20. The problem of bridging the gap between Tippett’s values and the limiting 
form has been considered by Fisher and Tippett (1928), and the argument which they 
employ is interesting. Concentrating for a moment on the upper value, we note that the 
upper member of a sample of hn, members is the upper member of a sample of h of upper 
members of samples of n. Both distributions will tend to the same limiting form, if it 
exists ; and consequently the limiting value must be such that the extreme member of a 
sample of n from it must itself have that distribution. That is to say, if F is the prob¬ 
ability of an observation being less than x, 

F(a^x + .(9.61) 

where and are functions of n, > 

It may be shown from this equation that F must begone of three fonns:— 


- x^^)dx 


(IF ~ A(— x)^'~^ exp { — (— . (9.54) 

The first we have already reached. The second and third arise if the original distribution, 
instead of tending to infinity exponentially, tends less rapidly such that 

lim (1 •— F)x^ exists and is not zero. 

—► 0 

I'he distribution (9.54) itself has (9.52) as a limiting form as A tends to infinity. It 
has therefore been proposed as a “ penultimate ” form, to bridge the gap between n = 1000 
and n = 10^*, which is apparently the first point at which the ultimate form provides a 
reasonable approximation. For the penultimate form we have 


(9.52) 

(9.53) 
(9.64) 


= r A(- xy- >.r’-e -dx 

J —XI 


and on putting 


X — P* 


i\ = I* (— \YUc-*dt 

_(-irr(i + 0 


The following values illustrate the relationship between the known form {n = 500, 1000) 
and the penultimate form for two convenient values of A : 


1 

ft 

Standard Deviation 


Pi 



Penultimate 

Actual 

Penultimate 

Actual 

Penultimate 

Actual 

0*0768 1 

0*0845 

i 1000 

600 

0*3433 

0*3604 

0*3614 

0*3704 

0*648 

0*498 

0*618 

0*670 

—a- 

3-862 

3-761 

4*088 

4*003 







DISTRIBUTION OP RANGE 


223 


JHatribuHon of Smge 

9.21. The range is the difieienoe of the highest and the lowest value of a sample, and 
the simultaneous distribution of top and bottom values is, from (9.23), 

dF oc (El - E,)“-» dE, dE, 

= n(« - l)(Ei - E.)”-** dEi dE,. . . . (9.55) 

The distribution function of the range w is then given by integrating this distribution over 
values of Ei and E, such that a:, — Xi < w. So far as I am aware, it is not known whether 
limiting forms of this distribution exist or .what they are. It is, however, evident that 
for large n the range is also large, and it seems doubtful whether the difference of two 
variates which (for an unlimited curve) tend to + oo and — oo respectively has any general 
limiting form. In any case one would suspect^that the limiting form is reached slowly. 

For particular cases equation (9.55) is soluble explicitly. The normal case has been 
fairly completely studied by Tippett (1925) and E. S. Pearson (1926 and 1932). Tippett 
found the first four moments of the distribution of the range, tabulated the mean values 
for values of n up to 1000 and gave a diagram for determining standard errors. (These 
tables and diagram are reproduced in Tables for Statisticians and Biometricians, Part II.) 
Briefly, his approach is as follows:— 

From (9.55) we have, for the mean range E(w), 


E{w)=n(n — l)[ dpS (Ei — Ej)" ®(a;i — a:*) dE,. . , (9.56) 

J —QO J —OO 

On expanding (j^i -- we got terms under the second integral sign like 


-S + 


if 


E,«^td,r, = J^say. 

— 00 o "i" 1 


Then 


Too riJ F 1 rw 

But Uk+i rfE, = - - --' - f J’ dx,. 

J - 00 L ^ i ^ J —w ^ i — oo 


Hence 


71-2 


Ei.) =. "'Zf* <' - a,, 

= r {1 - (1 - E,)~ - E,"} dxy .(9.57) 

J —00 


In a similar way it is found that 

var (to) = 2f T {1 - E^" - (1 - E,)” - (E, - E,)" } dx^ 

J —OC J - 00 


dx. 


- {E(w)Y .( 9 . 58 ) 





224 


8TANDABD EBBOBS 




This equation was used by Tippett to obtain values of the variance for n up to 1000, The 
following values illustrate the general behaviour of the distribution 


n 

Standard Deviation 

Pi 

(approximate) 

P* 

(approximate) 

2 

0-853 

0-99 

3-87 

10 

0-797 

0-16 

3-15 

100 

0*605 

0-21 

3-38 

500 

0-624 

0-29 

3*60 

1000 

0-497 

0-31 

3*54 


Again it would appear that as w increases, the distribution of range diverges more and more 
from the normal form. 

The distribution function of the range in normal samples has recently been tabulated 
by E. S. Pearson and Hartley (1942). 


List of Standard Errors of Commonly Occurring Statistics 

9.22. In view of the general utility of the standard error it may be convenient to 
bring together at this point for reference a number of sampling variances and other results. 
Some of these have already been obtained in this chapter ; others are direct consequences 
of the formulae or methods developed ; and some will be proved later in the book. 

a 

Mean. var(m',) =— where a is the standard deviation of the parent. This 

is true in particular for a normal parent. The mean is always estimated fh>m the mean 
of the sample. 

Variance, var (m*) —For the normal parent var (mi) =:= Tables 

are given for this case in T.S.B. I ♦. These results are appropriate to the case where the 
variance is estimated from the sample variance. For numerical results for other cases 
see Davies and E. S. Pearson (1934). 

™ Pqj. normal parent var (.s) “ 

4n/ia ^ ^ ' 2n 

are the values for estimates from the square root of the sample variance. 

note on variance. 

(//6 — + 9 / 4 ) 


Standard Deviation, var («) == 


These 


Third Moment about the Mean, var (m^) 
6 o^ 


See previous 
For normal 


parent var (m#) = —. The third and higher moments are always estimated from the 
n 

moments of the sample. 

Fourth Moment about the Mean. YQs(m^)^ 

96a« 


(/^8 




8// 6//a + 
n 


For 


normal parent var (m^) = 


n 


var (F) 


Coefficient of Variation 
F* 


var (F) == —^ -For normal parent 


2n 


approximately. Tables given in T.S.B. I. 


* An abbreviation for Tables for Statisticiana and Biomelriciane, Part I. 



STANDARD ERRORS OF COMMONLY OCCURRING STATISTICS 225 


A. var (/?0 ^ For normal parent var 

ft/? 

Wi) !i=e Tables given in T.S.B. I. The distribution is fairly skew for moderately large 

n and the methods of Chapter 11 provide better tests of /3t as a measure of departure from 
normality. See 11.23. (The jS’s are defined in equation (3.66).) 

tf.) - - ^ + ■ W. - 8A + »ft). 

Tit 

T.S,B, I. 

Pearson Measure of Skewness (Equation (3.64)). Tables given in T.S.B. I. Probably 
skew for moderate n. See note on /Si. 

Pearson Mode, Formulae and tables given in Yasukawa (1926), the results of course 
being only applicable to modes calculated from the Pearson formula (equation (3.62)). 
Distribution may be skew for moderate n. 

Coefficient of Contingency, See 13.14. 

Coefficient of Associaticyn. See 13.8. 

Tetrachoric r. See 14.28. 

Mean Deviation, General formulae not known. See 9.13. For normal parent 

var (m.d.) = ^(i — ~Y 
71 \ n) 


Qmi's Mean Difference, See 9.14. For normal case var (Ai) 


(0-8068)2(7“ 


Median, var =" 4 ^^ where j/o is the median ordinate of the sample. For 
(1-2533)2(72 

normal parent var (m^) = — - -. For small samples from normal population, tables 

and formulae given in Hojo (1931). Results to higher order in n given by K. Pearson (1931). 

3a 2 

Quartiles, var (Q) = —where y is ordinate at the quartile concerned. For normal 

(4ny ) 

parent, var (Q) == Results for small samples from normal population given in 

n 

Hojo (1931). 

a® / 3 3 1 \ 

Sefni-inierqnarlile range, var (s.i.q.) == t-( 77.-2 + i-) where yu J/a» ar© the 


4?i\16^2 UU/l 

(0-7H67)2rT2 


quartile ordinates. For normal parent var (s.i.q.) =- 

Deciles, For the normal parent, variances are 

for dedl™ 4, f. 


(]-3180)V 


(1-4288)%* 


(1-7094)%* 


A.8.—VOL. I. 



226 


STANDARD ERRORS 


Range, See 9.21. 

(1 _ p*)* 

Correlation Coefficient. See 14.10. For normal case var (r) =® ^But it 

th 


is better to use Fisher’s transformation (14.18) or the Tables by David (1938). 

< T ®(1 

Coefficient of Regression. See 14.10 and 14.11. For normal case var (6i) = — 


p») 




Standard Errors of Sums and Differences 

9.23. Suppose we have two variables Xi, a:,, which may or may not be independent. 
We have, if a is their sum, 

E{z) = E{xi) + E{x,), 

or the mean of * is the sum of the means of Xi and x,. If then we measure Xj and x» about 
their respective means, the mean of z is zero and thus 

var z = JE(z*) = E(xi + x,)* 

= E{xl) + 2E(x,x,) + E(xl) 

= var Xt + 2 cov (oji, x^) + var a;*. • • . (9.69) 

Similarly for the diflFerence of two variables we have 

var z ~ var x^ -- 2 cov (xj, x^) + var x^ » . • . (9.60) 

In particular, if Xi and x^ are independent their covariance vanishes, for it becomes 
the product of the two means, each of which is zero. In this important case we have, for 
the sum, 

var {xi + Xt) =r var Xi + var a:, . . . • (9.61) 

and for the difference 

var {Xi — Xi) = var a:i + varXi. • . . . (9.62) 

These results are of fundamental importance : the variance of the sum or difference 
of two independent random variables is the sum of their variances. Generally if 

2 = aiXi + OiXi + . . . a^x^ 

and the n variables are independent, 

var z == a\ var Xi + a\ var ic, + • • • var x^, . • . (9.63) 

In particular we have, for the sampling variance of the difference of the means of two 
independent samples, say 7n\ and p'j, 

var {m[ — p[) = .(9.64) 

n, n, 

Pi and to, being the respective variances and 7ii n* the respective numbers in the samples. 
Example 9,10 

A random sample of 1,000 men from the North of England shows their mean wage 
to be 47 shillings a week with a standard deviation of 28 sliillings. A random sample of 
1,600 men from the South gives a mean wage of 49 shillings a week with a standard de¬ 
viation of 40 shillings. Required to discuss the question whether the mean level of wages 
diflfers between North and South. 

The diflFerence of the means is 2 shillings and we wish to know whether this is significant* 



STANDARD ERRORS OP SUMS AND DIFFERENCES 


227 


From (9.64), taking as usual with large samples the unknown variances to be those of the 
samples, we find 

T« (difference) = j _ + _ _ J-SM. 

The standard error is thus 1-36 and the difference in means, being less than twice this 
amount, is hardly significant of any real difference. Had the difference been three shillings 
instead of two we should probably have concluded that the difference, being more than 
twice the standard error, was significant. 

There is an alternative approach to this problem which is worth noticing. Suppose 
we assume as our hypothesis under test that the distribution of wages in the two areas 
is t|;ie same. The ^fference in the variances makes this rather unlikely, but on that 
assumption we may combine the sample figures to give a new estimate of the mean and 
variance in this distribution, e.g. the mean might be taken to be given by 


(1000 X 47) + (1500 X 49) 

2500 

= 48-2 shillings. 

In the first sample the sum of squares of deviations about the mean 47 is 

1000 X 282 = 784,000, 

and hence the sum about the origin is 784,000 + (47^ x 1000) = 2,993,000. Similarly 
in the second sample the sum of squares of deviations about the origin is 

1500 (402 + 492) = 6,001,500. 

8 994 500 

The second moment of the whole about the origin is then ’ = 3597'8, and hence 

the variance is 3597»8 — (48-2)2 ^ 1274*56, We might take this as our estimate of the 
variance in the population and our problem would then bo : does the mean in one of the 
parts of the whole sample, say the first, 47 shillings, differ significantly from the mean 
of the whole, 48*2 shillings ? 

Now at first sight it looks as if this is a case for the application of (9.64). We have 
two means, 47 and 48*2, with respective variances 784 and 1274*56, and require to know 
whether the means are significantly different. But the samples are no longer independent, 
for one of them is part of the other, and a modified formula must be used. If the means of 

the separate samples are n of the two together 

is given by 




Exi + Zx^ 


ny + 

The difference of and this quantity, say q, is then 

ZXy + ZX2 

n.y + Tla 


q == }^Zxi 

Uy 

1 


Vy-\ nAriy 






Thus 


Uy -f- Ui [Tly J 



228 


STANDARD ERRORS 


and hence 

Since Xi aind are independent, this reduces to 

mi(w, 4- n, 

In our case ni = 1000, Wj = 1500 and our estimate of ju^ is 1274*56. The variance of the 
difference then becomes, on substitution, 0*7647. The observed difference is 48*2 — 47 = 1*2. 
Once again this is less than twice the standard error {— \/-164,1 = 0*87) and again wo 
conclude that the difference is not significant. 


REFERENCES 

Davies, 0. L., and Pearson, E. S. (1934), “ Methods of estimating from samples the popula¬ 
tion standard deviation,’' Supj). Jour. Boy. Stat. Soc., 1, 76. 

Fisher, R. A. (1920), “ A mathematical examination of the methods of determining the 
accuracy of an observation by the mean error and the mean square error,” Monthly 
Notices Boy. Astr, Soc., 80, 758. 

-and Tippett, L. H. C. (1928), “Limiting forms of the frequency distribution of the 

largest or smallest member of a sample,” Proc. Camb. Phil. Soc.y 24, 180. 

Gumbel, E. J. (1934), “ Les valours extremes des distributions statistiques,” Annales de 
VJnstitut Henri Poincare, 5, 115. 

Hartley, H. O. (1942), “ The range in normal samples,” Biometrika, 32, 334. 

Helmert (1876), Astronomische Nachrichten, 88, No. 2096. 

Hojo, T. (1931), “ Distribution of the Median, Quartiles and Interquartile distance in 
samples from a normal population,” Biornetrika, 23, 315. 

Kondo, T. (1929), “ On the standard error of the mean square contingency,” Biomefrika, 

21, 376. 

Nair, U. S. (1936), “The standard error of Ginis mean difference,” Biometrikn, 28, 42S. 

Pearson, E. S. (1926), “ A further note on the distribution of range m samples taken from 
a normal population,” Biometrika, 18, 173. 

- and Adyanthaya, N. K. (1928), “The distribution of frequency constants in small# 

samples from non-normal symmetrical and skew populations,” Biometrika, 20 a , 
356. 

- (1932), “ The percentage limits of the distribution of range in sami)les from a normal 

population,” Biometrika, 24, 404. 

-— and Haines, Joan (1935), “ The use of range in place of standard deviation in small 
samples,” Supp. Jour. Boy. Stat. Soc., 2, 83. 

- and Hartley, H. O. (1942), “ The probability integral of the range in samples of n 

observations from a normal population,” Biometrika, 32, 301. 

Pearson, K., and Filon, L. N. G. (1898), “ On the probable errors of frequency constants 
and on the influence of random selection on variation and correlation,” Phil. 
Trans., 191A, 229. 



EXERCISES 


229 


Pearson, K,, “ On the probable errors of frequency constants,” Part I, Biometrika, 1903, 
2, 273; Part 11, Biometrika, 1913, 9, 1 ; Part III, Biometrika, 1920, 113. 

- (1913), “ On the probable error of a coefficient of correlation as found from a fom-- 

fold table,” Biometrika, 9, 22. 

- (1915), “ On the probable error of a coefficient of mean square contingency,” Bio- 

mefrika, 10, 590. 

- (1931), “ On the standard error of the median to a third approximation,” Biometrika, 

23, 361. 

- and Pearson, M. V. (1931), “ On the mean character and variance of a ranked in¬ 
dividual and on the mean and variance of the intervals between ranked individuals,” 
Biometrika, 23, 364. 

Tippett, L. H. C. (1925), “ On the extreme individuals and the range of samples taken 
from normal population,” Biometrika, 17, 364. 

Yasukawa, K. (1926), “ On the probable error of the mode of skew frequency distributions,” 
Biometrika, 18, 263. 


EXERCISES 


9.1. Show that the mean value of the variance is given exactly by 

« — 1 


n 


-Hi 


Eirn^) 

and that its variance is given exactly by 

, , (n — IN® fit 

var (r/ij) — 1 —-—j 

Hence verify that the formulae of this chapter as applied to the variance of a sample 
are accurate to order 


-f4 2{n~l) 


9.2. In the height distribution of Table 1.7 it has been found that 

== 6-616 
ms = - 0-207 
m4 = 137-089. 

Regarding the distribution as a random sample from a population which is approximately 
normal, show that docs not differ significantly from zero (which, of course, must be so 
if the assumption of normality is to be maintained) and that has a standard error of 
about 4 per cent, of its value. 

9.3. Verify that the standard error of the first decile in samples from a normal popu- 

, . . 1*709(t 

lation IS —;—• 

'\/n 

9.4. In the distribution of Australian marriages of Table 1.8 it has been found that 
the mean is 29*4 years, the standard deviation 8 years approximately. The median fre¬ 
quency is about 63,150. Taking this distribution to be a random sample, show that the 
standard error of the mean is 0-015 years and that of the median 0-043 years. 



230 


STANDAED ERRORS 


9.5. If a series of random samples of different sizes is drawn from a population in 
which the proportion of members bearing an attribute A is tn, show that the variance of the 

ro{l — ro) 


proportions of A in such sets is 
in the samples. 


H 


where H is the harmonic mean of the numbers 


9.6. Show that the sampling variances of the first four oumnlants, as calculated 
frora the moments, are given to order n~^ by 
1 

var Ki — -Kt 
n 

var K, = -(»f 4 4- 2k|) 
n 


var Kt — -(»f* + ^KtKt + + 6»f|) 

n 

var /C 4 = + 16/fe/c, + + 34kJ + ^ 2 k ^ k \ + 144/c|r:* + 24#c|). 

7li 


9.7. If the variate range is divided into sub-ranges and the frequency of a large 
sample falling into the pth range is /p, show that 

-/„(i -4) 

cov(/p,/„) = -IfJ, 

iB^d hence find expressions for the sampling variance of the rth moment about an arbitrary 
|K>mt. 


9.8. Show that in odd samples of 
the sampling variance of the distribution 


n from a rectangular population of unit range 
of the median is given exactly by 1 ^ 2 )’ 



CHAPTER 10 


EXACT SAMPLING DISTRIBUTIONS 

10.1. The role of the sampling distribution in statistical inference has been indicated 
in Chapter 8. In the present chapter we propose to give an account of the main methods 
of finding such distributions when the population from which the sample was derived is 
specified. It will, as usual, be assumed that the sampling is simple and random. Thus, 
if the parent distribution is dF{x) the simultaneous distribution of n values Xi ... is 
dF(xi) dF(x,) . . . dF(x„); and if a is a statistic 

z z(Xi ... x^) ...... UO. 

the distribution function of a is given by 

F(z) = j . . . j dF(x,) . . . dF(x„) .... (10.2) 

the integration being taken over the domain of the x's such that z(Xi . , , < Zq* 

Formally, (10.2) is the solution of our problem, which thus reduces to the purely 
mathematical one of evaluating certain multiple integrals or sums. The methods with 
which we are here concerned are fundamentally devices of various kinds to facilitate the 
integrative process. They may be classified into four groups :— 

(a) straightforward evaluation of the integral (10.2) by ordinary analytical processes 

such as a convenient change of variable ; 

(b) the use of geometrical terminology to effect the same object and to avoid cumbrous 

analytical formulae; 

(c) the use of characteristic functions; and 

(d) other analytical methods, including mathematical induction. 


10.2. As an illustration of the straightforward analytical approach, let us find the 
distribution of the sums of squares of n independent variables, each of which is distributed 
normally with unit variance and zero mean. The joint distribution of the n variables is 


then the product of n quantities of type 


1 

-^ 2 

V(27t) 


that is to say 


dF = 


(2:7r)2 


exp{- -f -h 




dx 


n* 


. (10.3) 


We require the sampling distribution of 

z =xj +xl + . . . x„» .(10.4) 

We have thus to evaluate the multiple integral 


1 * = j . . . ^ exp (— iFx*) dxi . . . dx^ 

•' •'( 271)2 

over the domain of x’a conditioned by (10.4). 

231 



232 


EXACT SAMPLING DISTEIBUTI0N8 


Make the transformation to variables z, 6^, 6^, . . . 

' Zx = 2 * cos $i cos 0, ... cos 0„_i 

*, = 2 * cos 6j cos 0, . . . cos 0„_a sin 0„_i 

— 2 * cos 01 cos 0, . . . cos 0„_y sin 0„_;+i 

a:„ = 2 * sin 0i •* 

The Jacobian of this transformation is given by 

d(xx, . . . xj 


( 10 . 6 ) 


0 ( 2 , 01 , . . . 


«-3 


which is equal to |2 a times the determinant 


cos 61 cos . . 

, COS 0„_1 

cos 01 cos 02 . . 

. cos 0„_2 ®n-l • • • 

sin 01 

— sin dt cos 0* . . 

. cos 0„_1 

— sin 0 i cos 02 . . 

. COS 0 „. 2 sin 0 „_i . , • 

cos 01 

— cos 01 sin 0 j . . 

. COS 0„_1 

— cos 01 sin 02 . . 

. cos 0„„2 • • • 

0 

— cos 01 cos 0 , . • 

. sin 

+ cos 01 cos O2 - . 

. cos 0 yj—1 • • • 

0 


Taking out common factors in columns we fin<l that this determinant is equal to 


C 08 ^‘““^ 01 COS"~^ 

02 . . . cos 

1 sin 01 sin 0 a . 

. . sin 0 „„i tin 

] 

1 

1 

, . . 1 

— tan 01 

— tan 01 

— tan 01 

. . . cot 01 

— tan 02 

— tan 0 a 

— tan 02 

... 0 

— tan 0„^2 

— tan 0„^2 

cot 0„_2 

0 0 

— tan 

0 

0 

T 

0 

0 0 


and, on subtracting each column from the preceding one, the determinant is found to reduce 
to cos"“® 0j cos"“^ 0, . . . cos 0 „_ 2 . 

Thus our integral becomes 


f . . . [—J 2 a“ cos””'”* 0, . . . cos 0„_2 dz JO, . . . rfO„_i 

J J 


( 10 . 6 ) 




The advantage of the transformation is that the limits of the variables are now much 
simpler, z itself can vary from 0 to 2 and the 0’^ from 0 to jr or 2n. Thus the integral 
(10.6) divides into a product of integrals, those in 0 being constant, and we find for our dis¬ 
tribution function of z 




j >-2 

dz. 


F{z) -= ^'1 

The constant k may be evaluated by integration between 0 and 00 and we have 

1 

Jo 


(10.7) 


e~** z ‘z dz 


k2fr(^ 





THE ANALYTICAL METHOD 


233 


Henee ^ distribution sought is 



a Pearson Type III curve. 


x-8 

e-l» a 2 da, 


0 < z < 00 


( 10 . 8 ) 


103. The essential- feature of the change of variables is the simplification of the 
domain of integration as defined by the limits of the new variables. In general, we usually 
take the statistic whose sampling distribution is being sought to form one of the new variables 
and choose » — 1 others in any way which may be convenient to the particular problem. 
Then, if J is the Jacobian of the transformation, namely 


d(xt 


0(2, Oi, . 




the integral (10.2) becomes 

-1. . . jM).. . /W 


. dOn^u . (10.9) 


f(x^) being the frequency function of the parent and Xj being expressed in terms of z and 
the O'a, The integration now takes place with respect to the 6’s, which can usually be 
chosen so as to vary between limits which are independent of z ; and thus the indefinite 
integral (10.2) is replaced by more easily calculable definite integrals. 

As always in such cases J is subject to an ambiguity of sign which must be determined 
so as to make the transformed integral positive. The validity of the variate-transformation 
depends on the familiar conditions governing the change of variable in a multiple integral. 
For example, it is a sufficient condition that the new variables and their first derivatives 
shall be continuous in the x’s and that J does not change sign in the domain of integration.* 
Some further examples will make the general type of investigation clear. 


Example 10.1 

To find the distribution of the mean of a sample* of n values * x^ from the dis 

tribution 


dF 


dx 

7c(l + a;2) 


— 00 .r 00. 


The joint distribution is 


and the statistic z is given by 


71^ 


II 




dXj 

(1+ 


. ( 10 . 10 ) 


n 

nz = y rj. .(10.11) 


We have to integrate (10.10) over a domain of x’s subject to Z'x < nz. Let us take new 
variables Xi = Xi, — x,, . . . x„_i — x„_i.and 


*** ^8 *1* 


♦ See, for example, de la Valine Poussin, Cours d^analyae infinitesimal, 1926, voL l,para. 285; vol. 2, 
para. 18. 




EXACT SAMPMNG DISTBIBUTIONS 




Here J is evidently equal to the constant n. Our new variables Xi . . . may extwttd 
from — 00 to + 00 and the new variable * from — oo to 2 . We then have 


F{z) 


f,*r. 


n—1 

n 


dxj 


Tc" j“i (1 + Xj^){l + (ns 

and the/rejKtency function of s is given by the {n — l)-fold multiple integral in *1 . . . 
in (10,12). This integral may be evaluated by step-by-step integration. We have 
1 1 


. ( 10 . 12 ) 


(1 + a!»){r* -f (a - *)»} {o« + (r + l)»){a* + (r - 1)*} 




2ax 


+ 


o* -f r* 


+ 


2o* — 2ax 


+ 


>•* -f 1 ‘ 1 ' r* -f (a — x)* ' r* + (a 

Whence, integrating with respect to x from — 00 to -f 00 , we find on the right 


» + 11 


-j^a log (x* -f 1) — o log {r* -f (a - x)*} 


(o* -h (r -h l)*}{a» -f (r - 1)* 

•f (o* -f f* — 1) tan“^ X -f 
1 


'JL±} tan-i 


reducing to 






Thus in (10.12), taking x = x„_i, r 
(n — l)-fold integral reduces to 


»* -f- (r + 1)» 
1, a — nz ■— Xi 


. (10,13) 
x„_ 2 , we find that the 


r r ” V 


dx^ 


;-r (1 + x^*) {2* + (nz-x,-. . , -x„_ 2 )*}’ 
Integrating with respect to x„_j, x„_i . . . successively, we reduce this eventually to 

n* 1 


n{n* -f- (nz)*} 7r(l -|- z*) 
Thus the distribution of z is given by 

dz 


dF = 


— 00 < 2 < 00 , 


. (10.14) 


. (10.15) 


n(l + z*) 

and is thus the same as that of a single observation. 

This is an interesting example of the failure of the Central Limit Theorem, the mean 
of samples of n failing to tend to normality for large n. The second moment of the distribu¬ 
tion does not exist. 


Example 10.2 

To find the distribution of a linear function of n independent variables Xi . . . x„ 
where x^ is distributed normally with zero mean and variance Vj. 

Let the linear function be 

z == UiXi + + ..(10.16) 

Then by a transformation == we have 

■S/Vf 

z = .(10.17) 

and is now distributed with zero mean and unit variance. Our problem is thus equivalent 
to finding the distribution of a linear function of variables each of which is normally dis¬ 
tributed with zero mean and unit variance. 



THE ANALYTICAL METHOD 


235 


Consider a tr«iHformation of type 

Cl — hCi + liSt 4- • • • + 

Cl = Will + W|f» + . . • +w„f„ ^ ^ ^ ^ ^ Jgj 

Cn — PiCi 4- p^t 4- • • • 4- PnCn , 
and let us determine the Vs .. . p’s such that 

If* + my» 4- . . . P/* == 1, alii \ , /lo 19) 

i/fc 4- + . . . pjPtc — 0> f-Mi* k,j^k} 

This can always be done, for the conditions impose only n 4- i«(a — 1) conditions on the 
n* constants. 

We have then 

n n 

y^.Cj^ — (Ml 4- • • . + Mn)* 4- • • • 4- (Piii 4- • • • 4- PnCn)* = 

?-l i-l 

in virtue of (10.19). The joint distribution of the I’s is by hypothesis 

—„exp(-iZ’f*)/7df 


exp(- ndc 


where 


(27t)a 

J 


ai _ i_ 
dc ~'~dj' 
di 


The determinant j is then, from (10.18), 


h 

1% • • 

• In 

nil 

m, . * 

• w„ 

Pi 

. . 

• Pn 


and multiplying this by the equal determinant 


h 

nil ‘ 

• Pi 

It 

m2 . 

. p, 

1 

• • 

• Pn 


we find, in virtue of (10.19), that the product is 

1 0 ... 0 
0 1 ... 0 


0 0 ... 1 


Thus j = ± 1 and (10.20) becomes 


. ( 10 . 20 ) 


—exp (- i2:c^)ndc. 

( 271)8 


. ( 10 . 21 ) 




236 


EXACT SAMPLING DISTBIBOTIONS 


Now the may vtuy from — oo to oo, and if we require the distribution of one of the 
f’s, say ( =t ^ we have to integrate over all values of i such that 

ZljSf < fi. This is equivalent to a range of C* from — oo to Cj and of the other f’s from 
— 00 to + 00- Thus the integral of (10.21) becomes the product of (« — 1) definite integrals 

each equal to 1 c~*^* dC ~ (2?r)* and the integral 1 e“^ dC, and hence reduces to 

J— OO J — 00 








( 10 . 22 ) 


In other words, C is distributed normally with unit variance and zero mean, t is an 
arbitrary linear function subject to the condition that — i. Referring to (10,17) 


we see that the slightly more general linear function z 




Eaf\/vj^^wil] be distributed 


normally about zero mean with variance Ea.H,, for then —coefficients 
obeying the condition El^^ ^ I and is distributed with unit variance. 


The Geometrical Method 

10 . 4 . A considerable amount of cumbrous analysis may usually be avoided by the 
use of geometrical representation of the domain of integration. We may imagine the values 
Xi , . , attaching to any given sample as the co-ordinates of a point in an ^-dimensional 
Euclidean hyperspace. The function dF{xi) . . . dF{x^) may then bo regarded as the 
density at the point and the total frequency between Zi and z^ will bo the integral of this 
density (the weight) in a region lying between the two loci z{xi . , . xj = Zi and 
z(Xi . . . which in general will be hypersurfaces in the n-fold space, i.e. will 

themselves be spaces of (n — 1) dimensions. The distribution function of i:; will be the 
total weight between the hypersurface corresponding to 2 ; = — 00 and that corresponding 
to z ; and the frequency function will be the element of weight between the hypersurfaces 
z — Jdc and z + ^dz. 


Example 10,3 

Consider again the problem of Example 10.2. In the n-fold f-space the density is 
given by 


-i-„ exp (- im. 

( 271)2 


The statistic z (= Ua^Xj) determines a h 5 q)erplane 

z = .(10.23) 

and we have to find the total weight between this hyperplane and the corresponding hyper¬ 
plane at — 00 , i.e. the weight on one side—the “ lower ” side—of the hyperplane (10.23). 

Now ZP is the square of the distarxce of the point from the origin and is 

therefore unchanged by any rotation of the co-ordinate axes. Choose such a rotation 
which brings the axis of one variable jxerpendicular to the hyperplane (10.23), meeting it 
in Q. Let P be the sample point and 0 the origin. Then 

ZP = OP* == OQ* -I- QP*. 



THE GEOMETRICAL METHOD 


237 


so that the density at P is 

-.L- 

(2jr)2 

Fof variation over the hyperplane OQ^ is constant and the integral of c~*^*** is thus a 
constant independent of OQ. Hence the frequency function of z is given by 


f(z) - ke-*OQ\ 


k being some constant. 

But OQ is the distance from 0 to the hyperplane and is giten by 


Hence 


0 ( 2 * = 


z* 




i.e. z is distributed normally with variance about zero mean. 

The reader will find it instructive to compare this example with the previous one. 
They are, in effect, the same thing expressed in different language. 


Example lOA 

Consider again the illustration of 10.2. The elegance of the geometrical approach is 
well brought out by the analogous derivation of the result there obtained. 

In fact, our density function, as before, is given by 

We require the distribution of the statistic z = and the density is obviously constant 
over the surface z = constant, that is to say the {n — l)-dimen8ional hypersphero. The 
frequency function of z is then the integral of this constant density between the b5q)erspheres 
and z -f dz, i.e. is proportional to times the element of the volume of the hyper- 

spliero, which itself is proportional to the 7tth power of the radius OP. Thus we have 

dF = ke-^^^^OP^ dz 
(iz 


= z*'''-*^ dz. 


giving, on evaluation of the constant, 


dF 


2 . 




as before. 

Now suppose that the quantities Xi , , , while still being normally distributed with 
unit variance, are subject to p linear restrictions of type 


In the n-space the variables x will tlien be constrained to lie on p h 3 rperplanes. The first 
will cut the hypersphere of constant density in a hypersphere of one lower dimension, also, 
of course, of constant density ; the second will cut this in a hypersphere of one lower 
dimension still, and so on. The result of the linear restrictions will be to constrain the 



838 


EXACT aAMPUNG DISTRIBUTIONS 


variables to a hypersphere of p lowea: dimensions, and thus the distribution of * in these 
ciromnstances will be as before, but with n — p instead of », i.e. 


dF »- \ -.e-*» *««-»>-!» dz 


. (10.24) 


Examph 10.5. The sampling distribution of the mean and variance in normal samples 
Writing X for the mean of a sample, we have, for the variance «*, 

s* = h:(x - x)* 

— -Xx* — X*. 
n 

In samples from a normal population with zero mean and unit variance the density at the 
point Xi e a . is proportional to 

exp \Ex^) =■ exp {— + nx^)), . • . (10.25) 

Let us find the sampling distributions of s and x. From (10.25) it is seen that the density 
function can be expressed simply in terms of those quantities, and we then have to find some 
transformation of the volume element rfxi . . . dx^. 

In the n-space consider the unit vector whose direction cosines are 

say OQ where O is the origin. If P is the sample point, let PM be the perpendicular from 
P on to OQ. Then the length of OM is 


V n \/n V 

The length of OP is Thus the length of PM is — yix^)^ ~ s\/n. 

The element of volume at P may be regarded as the product of an elemental increment in 
OM, equal to dxy^n, and the elemental volume in perpendicular hyporplane through M. 
In the hyperplane the contours of equal density, as in the last example, are hyperspheres 

of radius centred at if, and consequently the element of volums is equal to k dx ds 

multipliep by other elements which need not concern us since they are independent of 
X and B. We have then for the element of frequency 

(ZF oc exp {—J(n6f2f/Jr f/s^ , . . . (10.26) 

and this splits into two factors 

dF cc dx .(10.27) 

dF oc ds .(10.28) 

Thus in samples from a normal population the distributions of mean and variance are 
independent. Equation (10.27) is equivalent to the result found in Examples 10.2 and 10.3. 
Equation (10.28) is new. We have 

dP oc da^ 


and, on evaluation of the constant, 

n—1 


dF ^ 


n 2 


n-l 
2 2 






0 <« < 00 , 


. (10.29) 



THE GEOMETRICAL METHOD 


239 


It is ^interesting to compare this with the distribution of the previous example. In the 
latter case we found the distribution of the sum of squares of the variables measured from 

a fixed point. In this case we have found the distribution of ^ th of the sum of the squares 

measured from the sample mean. A comparison of the form (10.29) with that of (10.24) 
shows that the distribution of variances is, except for constants, the same as that of sums 
of squares when subject to one linear constraint. 


Example 10.6, “ Student's " distribution 
In the previous example we have 


x\/n 

8\/n 


OM 

PM 


= cot (f>, 


where <f> is the angle POM. 

If, then, we define a statistic z = -, z will be constant over the cone obtained by 

e 

rotating PO about the unit vector, keeping the angle <l> constant. The distribution of z will 
then be given by determining the weight between the cones defined by <f> and + d(f}. 

Consider the intersection of these cones with the hypersphere of radius OP. They 
will out off an annulus on the sphere whose “ content (the n-dimensional analogue of 
volume) will be proportional to OP 




The density function is constant and proportional to on the hypersphere and thus 

the total frequency between the cones will be proportional to 


r sin^*“ V d{OP) 

J 0 


ex 0 < ^ < 71. 

The distribution of z ( = cot <f>) is then given by 

kdz 


dF oc 


(1 + Z *)2 


or, on evaluation of the constant, 

dF = 




dz 

(1 + 


. (10.30) 


Since z is the ratio of two functions of the variables of unit dimension this distribution 
holds for samples from a nontoal population irrespective of the scale, that is to say, irrespec¬ 
tive of the variance of the parent population. 

The distribution is usually put in a slightly different form. 


x\/n 

- *)■}* 


= y/(n — 1)*. 




Put 




240 


EXACT SAMPLING DISTRIBUTIONS 


(10.30) then becomes 


dF 


_1_ , dt 


. (10.31) 


r(l±^) 

_ ^ __ 

VvV^F(~j (l + 


. (10.32) 


where r = to — 1. 

This celebrated expression is known as “ Student’s ” distribution after the nom deplume 
of its discoverer (1908).* The distribution function may be evaluated from the incomplete 
JS-function, but special tables have been prepared. One such, due to “ Student ” himself, 
is given as Appendix Table 3. 


Example 10.7, Distribution of the mean of samples from a rectangular population 
Consider now a sample of n values from the rectangular distribution 

dF =dx 0 < a: < 1. 

In the TO-space the density function will be a constant everywhere inside a hypercube 

0 < < 1, j = 1, . . . n . . . . (10.33) 

and zero elsewhere. The unit vector will be along the diagonal of this cube. If P is the 
sample point (Xi . . . x„) and PM the perpendicular on to this diagonal, then, as shown 
in Example 10.5, OM = x\/n. Thus, for the distribution of x we require the element of 
weight (which in this case is proportional to the element of volume) between the byperplanes 
* and X + d£] and this is equivalent to finding the content of the hyperjJane (its “ area ”) 
cut off by the various faces of the hypercube. The complication of the problem arises from 
the fact that as x increases this region changes its shape according to the number of edges 
of the hypercube cut by the hyperplane. 

Consider the “ quadrants ” 


^)>^} L- 

ry = 0 or if-' 




1 . 2 , 


. (10.34) 


whose corners are the comers of the hypercube. Any one of the corners may have 0 or 
lor2 . . . or TO ofitsco-ordinalt's equal to unity and the rest zero. Wo divide the quadrants 
into (to + 1) sets according as the corner has 0, 1, ... to of its co-ordinates equal to unity, 
that is, according as 

r-Er, 

/••I 


is equal to 0, 1, ... to. A quadrant of the tth set may be called Q,. There will be 
different Q,’s. Let 8 be any point of Q„, i.e. any point whose co-ordinates are all 



• Strictly speaking, “ Student’s ” distribution is that of (10.30), the modified form (10.32) being 
due to R. A. Fisher. The latter form is therefore sometimes referred to as Fisher’s {-distribution. 



THE GEOMETRICAL METHOD 


241 


and let just a of its co-ordinates be > 1. Then 8 will belong to just 
QfS, and so on. Now if « > 0, 

Z(- ■)'(<) - (1 - !)■ - 0- • ■ 


<-0 



. (10.35) 


Hence, if whenever a point belongs to a we give it a density (—1)* and then sum over 
all Q, the resultant density wdll be 1 or 0 according as the point belongs to the hypercube 
or not. 

Let the segment of the hyperplane 


z==I{x) .(10.36) 


lying in Qo have content F„( 2 ). Then the segment lying in any member of (10.34) will 
^ve content F„(z — r) which is zero if r > 2 . Further, the segment of (10.36) l 3 dng in 
any member of (10.34) will have the content 

k . . 

2^(-l)'r)F„(z-r).(10.37) 

where k ^[z]^ ~ the greatest integer less than z. 

To find Vn{z)y let ^^- 1 ( 2 ) be the projection of V^iz) perpendicular to one of the axes, 
so that 

yj^) == VnVn-liz)- 


Now r„( 2 ) is the content of the n-dimensional region bounded by (10.36) and the co-ordinate 
hyperplanes—a region whose base is therefore of content F„(z). The perpendicular from 

0 to this base is Hence 


F,(s) - . . .(10.38) 


Since F,(z) — Z'\/2 repeated applications of this formula give 


F„(2) 


== V” -n-1 

{n - lyr ■ 


Substituting in (10.37) we find for the content of the region common to the h5rpercube 
and the hyperplane 

~ (m —~ .... (10.39) 

for values of z between k and ^ 1. 

Since 



A.S. —VOL. I. B 



242 


EXACT SAMPLING DISTRIBUTIONS 


the distrihution of the mean m = - is given by 

n 

/(«*) 




{n - 1)! 




- <m <-. 

n n 


(10.40) 


This is the required distribution. It is unusual in consisting of n arcs of degree (n — 1) 

Ic 

in m, having (n — l)-point contact at their joins, that is at the points ~ {k = 1, 2, . . . n). 

Tl 

The distribution is symmetrical since the hyperplane z — constant is perpendicular to 
the long diagonal, which itself is an axis of symmetry of the hypercube. 

For particular values n — 2, 3, 4, (10.40) gives the following results for the frequency 
function :— 

» = 2: 4m, 0 m < | 

4(1 — m), I < m 1 

2701 * 


n 


n 


j - lY), 

27, 

'2 

128 


(1 - m)*. 




128 

IT 

V2H 


{rn^ — 4(m — 1)^}, 

{(1 - m )^ 


Ml 


m)3}, 


128 


0 < 7n < J 
J <m <1 
I rn < 1 
0 < m '' I 
i 

I <7V < I 

f < m < 1, 


If the frequency curve be drawn it will bo found to resemble a normal curve in appear¬ 
ance. The distribution, of course, tends to normality as 7i increases in virtue of the Central 
Limit Theorem, 


The Method of CharacterFactions 

10.5. It has already been noted that the characteristic fund ion of the sum of 
n independent variables is the product of their characteristic functions. This simple 
property enables us to find the sampling distribution of a wide class of statistics which 
are expressible as sums, and particularly of the mean. 

If we have a sample of n values from a population whose characteristic function is 
the characteristic function of their sum is <f>^. Thus the distribution function of their 
sum z is given by F(z) where 

F(z) - F(0) =■ \\ . . . .(10.41) 

2jlJ_oe it 

and the frequency function is 

/(=) • 

—00 


. (10.42) 



THE METHOD OF CHARACTERISTIC FUNCTIONS 


243 


The following examples wiU illustrate the power of these results. 

Example 10.8. Distribrdion of the Mean for the Binomial 
The characteristic function of the binomial {q + pY is 

{q + pe^y. 

The c.f. of the sampling distribution of the sum of n values is then 

' {q + pe")™ 

and that of the distribution of the mean of that sum^ is 


But this is the o.f. of the binomial 


/ 

\q + pe« j . 


(?+pr,.(10.43) 

the interval being - instead of unity: and hence this distribution is that of the mean. 
n 

Example 10.9, Distribution of the Mean for the Poisson Distribution 

Ti' 

The characteristic function of the Poisson distribution whose general term is ^ is 

exp — !)}• 

The c.f. of the mean is then 

exp nlija — l) 

and hence the distribution of the mean is the Poisson distribution, whose general term is 




* 


. (10.44) 


the interval being ~ instead of unity. 


Example 10,10, Distribution of the Mean for the Normal Population 
The characteristic function of the normal distribution 

dF = - -i—“ dx 
G\/\2n) 

is exp {— + Up), 

The c.f. of the distribution of the mean of n values is then 


f nt^a^ Up) r ,t^a^ . T 

exp »| - }— + _ exp I - 1 ^ j 


(10.46) 


* • 

This is the c.f. of a normal distribution with mean u and variance —, which is therefore 

n 

the distribution required. 



144 


EXACT SAMPLING DISTRIBUTIONS 


Mucamfh 10.11. Dist/ribvtim of ihe, Mem. for Oie Type HI Population 
The oharacteristio function of the distribution 


jp 1 -S./x\*~^dx 


o > 0 


(1 - itaf' 

The o.f. of the distribution of the mean of n values is then 


(‘-fy 


This is the c.f. of the distribution 


1 dx 

«{— -. 

1(Yh) \a / a 


(10.46) 


Example 10J2. Distribution of ihe Mean for the Rectangular Population 
The characteristic function of the distribution dF dx is 

- 1 

I — — . — • 

Jo 


The c.f. of the mean of n values is then 


e^^ - 


, and the frequency function is thus 




ttxf e” — 1 


. (10.47) 


This integral is everywhei'e holomorphic and the range of inlegration may then be 
changed to the contour f consisting of the real axis from — oc to — r, the small semicircle 
of radius c and centre at the origin, and the real axis fiom c to oo. Thus 




r 

Jj 


dt. 


• (10.48) 


dz ^ 0 if g > 0 


if g <0. 

(a - 1)1 


This may be seen by integrating along a contour consisting of F and the infinite semi¬ 
circle above the real axis if g > 0 and below it if ^ <0. 



THE METHOD OF CHARACTERISTIC FUNCTIONS 

Substituting in (10.48) we find 

(- 1 )“' 


245 


/<*) - j/'- '>'G)73 v^‘ 


(-1)1;^ 

(» - 1)! 


y<nx 


(-m” 


(n - 1)! 


itr 


This, with a few changes of notation, is the same as (10.40). 

10.6. General expressions may also be derived for the distributions of geometric 
means and the moments about fixed points. 

In fact, if y == log z, the characteristic function of y is 

(x{t) = I 

The distribution of the sum of n independent values of y, say nz, is then given by 


F(nz) - F(0) 


1 r 1 - ^ 

I--- 

2 nJ ^00 it 


(It 


. (10.49) 


and the distribution of the mean is that of z. But z = log u, where u is the geometric 
mean, and hence the distribution of u may be found. 

The frequency function, when it exists, is 


/(»*)= 


^-Unz 


Similarly the characteristic function of a power of the variate, say af, is given by 

p{t) = r e***" dF 

J —oo 


and thus the distribution of the rth moment, say z, by 

1 


foo f,~itnz _ 1 

F(nz)-F(0)= 

J-00 it 


. (10.50) 


Example 10.13. Disiribution of the Geometric Mean in Samples from a Bectangvlar Population 
If the population is 


dF ^-dx 
a 


0 < a; < o. 


the characteristic function of log z is 


f ** X** — = . 

Jo a 1 + if' 



246 


EXACT SAMPLING DISTRIBUTIONS 


The frequency function of »= 2* log a; is then given by 

(1 + itr 

I ^ efUnlcea-ii) 

— -^l —dt n log a — « > 0. 

25tJ_« (1 + tO" 

This integral may be evaluated in the manner of Example 10.12 and we find 

ft \ - («log a - 


whence, putting 2 = e», we find for the distribution of the geometric mean 2 

. n”2"~^/, o\”~^ 


n”2"~^/, o\" 


(10,51) 


Example 10,14. Distribution of the Second-order Moment about the Population Mean in 
Samples from a Normal Population 
If the distribution is 

1 --£* 

dF = - 77 ^,C 2 a* dx 

ay (2jr) 

the characteristic function of :r® is 


_^1 r® 

a\/(27i)J 




(1 2(7%)** 

The o.f, of the mean of n values, say m„ is then 




. (10.52) 


and the frequency function of this is 


L(i 


fi-iivit 

dt. 

‘■ZahtW, 


This may be integrated in the manner of the previoiw example, or the result written down 
directly from the consideration that (10.62) is the characteristic function of the distribution 


(2a#r(| 


-. UH'-l j 

e dm^ 


(10.53) 


a result which may bo compared with that of Example (10.4), to which it is equivalent. 
The Method of Induction 

10.7. The distribution of the sum of two independent variates may be obtained 
directly without the intervention of characteristic functions. If F-y{Xi) and E,(a:,) are 
the distribution functions, the distribution function of 2 = a-, + Tj is given by 

E = r f ‘dPi .(10.64) 

J — 00 w —00 



THE METHOD OF INDUCTION 


247 


the domain of integration being that for which + Xt < s 


LM 

n. 




Fi{z a?2) dF 2 * • • 

If, further, F is differentiable, the frequency function of z is given by 

/(^) == f /i(2 - ^%) /* dx^, 

J — 0 © 


. (10.55) 


• (10.56) 


/i and /a being the frequency functions of Xx and 

(10,56) can be used to obtain successively the distribution of the sum of any number 
of variables whose individual distributions are known. If all the variables have the same 
distribution the general form may be suggested when the results for two or three variates 
have been worked out. Its correctness can then be verified by induction. The following 
examples illustrate the method. 


Example 10,15 

Consider again the distribution 

jti dx ^ ^ 

dF = ~~-—~ — oo<a:<oo. 

7l{l + x^) 

By (10.56) the distribution of the sum of two independent variables each of which has this 
distribution has the frequency function 


This suggests the general form 


! - a;)"*)(l + x) 


dx 


n 




_ 2 _ 

7t(z^ + 2*)* 


If this is correct, then the form for (n + 1) variables is 

1 


U(. 
-X^}/ 


■Xrl..) 


+ (z — x)^ 

{z* + (n + 1)*} 


dx 


{n 4- 1) 

7t{2* + (w + !)*}• 


The result holds for n = 1, 2, and is therefore true in general. 


Example 10.16 

In Example 10.4 we found that the distribution of the sums of squares of n independent 
normal variates is given by 


dF 


2ir 




. (10.67) 


Suppose we had surmised this form from an examination of a few cases for low n. Let 
X be another variate distributed normally about zero mean with unit variance. We require 
the distribution of 2 + **. 



248 


EXAC3T SAMPLING DISTRIBUTIONS 


Let X* = V, Then v has the distribution 


dF s= ——c"*® dv. 

2tr(jr 


Then, from (10.66) the frequency function of the distribution of « = 2 + » is given by 

f* --g-Wu-e) ^-g-ie jKn-S) 

Jo ^ ^ 


g-i“ 


fi + 3 
2 2 


ni)r0) 


r-r f (tt — 2)-*2*(»-2) dz 
M\ Jo 






tl'l 1 

2 2 n 




which is the same as (10.57) with n + 1 for n. Hence the distribution holds generally:. 


The Distribution of a Ratio 

10 , 8 * Cases not infrequently arise in which we wish to find the sampling distribution 
of the ratio of two independent statistics, The problem becomes somewhat compli¬ 

cated when the divisor may be negative, but relatively simple in the contrary case. 

If Fu F 2 are the distribution functions of Zi and 2 , and v = then for the distribution 
function of v we have 


F 



or, in terms of frequency functions, 



dz,. 


. (10.58) 


. (10.69) 


Example 10.17 

C!onsider again the distribution of the ratio x/s di8cu8.sed in Example 10.6. Here x is 
the mean of samples of n from a normal population and is thus distributed as 

dF 00 e dx. 

8 is distributed as 

ns* 

dF 00 e 2<>* da. 


as we have found in equation (10.29). 




THE DISTRIBUTION OF A RATIQ 


249 


Then the distribution of v — - is, from (10.69), a constant times 

a 


s,e 2®* da <x 


(1.+1>*)^ ’ 

which then gives us the distribution (10.30) on the evaluation of the constant. 

Example 10.18, Fisher’s z-disiribution 

Suppose we have two independent samples of »i and n, members respectively from 
normal populations with variances orf and a\. The distributions of the sample variances 

sf and si ^F(x — x)*j are then 

dF oc e 


0 <«i < 00 

0 00 

The distribution of the ratio < = ~ is then, from (10.59), given by 


dF cc e 2®.* 


/(O a «, exp exp ds, 


OC 


oc -~ 




vWj-f n,~2’ 
2 


0 <f < 00 


(n^t^ nMi±^ 

V <^y 

This is usually expressed in a somewhat different form. Put 


(10.60) 


I log — 


w,(w, — l)s f ^ J n,(nt - 1) 


1)^1 




We find for the frequency function of z 

g(ni-l)z 


/(2) OC 


/ (wi — l)e‘^‘ 
\ 

. anc 

m 


+ 


Tta 


+ n,-2) 


»,(ni - 1) 

— 00 < 2 < 00 


or, writing = 7 ^^ — 1 and r, = n, — 1, and evaluating the constant term 

2 (Ti”‘ Ua’’* ^ 2 **'* 


dAi nN ' fv^e^ 




(10.61) 


In particular, if Uj a\ Fisher’s z-distribution of half the logarithm of the ratio 

of two variances from a normal population 


m 


oAi Vt\ (v,e** + r,)*('*+*’•>■ 
\2’V 


(10.62) 


The distribution function of z may be obtained from tables of the incomplete B-function. 
Special tables showing, for various values of Vj and v„ the values of z corresponding to 
F{z) = 0.99 and 0.96, have been prepared and are given as Appendix Tables 4 and 6. 



250 


EXACT SAMPLING IHSTRIBJJTIONS 


10.9. Up to this point we have been mainly concerned with the distribution of a 
single statistic compiled from the members of a sample which is random and Simple. The 
methods may, however, readily be generalized to obtain the simultaneous distribution of 
several statistios. For example, if there are several statistics Zj, Zj . . . Zp, and the joint 
distribution of the sample values is represented by . . . »„), the character¬ 

istic function of the z’s is given by 



exp (itjZi itpZp) dF (xi . . . x„) 


. (10.63) 


and the frequency function of the z’s (if it exists) by 

/(*!, . . . 2p) = ^,. . . exp (- i«,Zi ... - itpZj,)<l>{ti . . . tp) 

dti . . . dtp . . (10.64) 

Examples of the use of these results will occur in the sequel. 


NOTES AND REFERENCES 

A systematic account of the various methods for deriving sampling distributions has 
not previously been given, except in regard to characteristic functions, as to which see 
KuUback (1934). The geometrical method is largely due to R. A. Fisher, whose use of it 
to derive the sampling distribution of the correlation coefficient (1915) is a beautiful example 
of the power of the method (cf. Chapter 14). See also Uspensky (1937). 

Some of the distributions derived in the foregoing examples are classical. For 
** Student’s ” distribution see his paper of 1908 and Fisher’s paper of 1925. The distribu- 
tion of the sums of squares of values from a normal population was discovered by Helmert 
ih 1876 but forgotten until Karl Pearson rediscovered it in 1900. The distribution of the 
mean of samples from a rectangular population is traceable as far back as Lagrange (31 iscel- 
IttHea Taurinmsiuy 1770-73), but was forgotten and rediscovered simultaneously by Hall 
and Irwin (1927), the former using the geometrical method and the latter characteristic 
functions. For the distribution of means from Pearson curves, see Irwin (1930). For 
Fisher’s z distribution, see his paper of 1915 and that of 1924. For the distribution of 
a ratio, see Cramer (1937) (Exercises 10.8-10.11 below), Geary (1930), Fieller (1932), 
and Nicholson (1941). The distribution of the ratio of two normal variables exhibits some 
unusual features; it may, for example, be bimodal. 

Cramer, H. (1937), Random Variables and Probability Distributionsy Cambridge University 
Press. 

Fieller, E, C. (1932), “The distribution of an index in a normal bivariate population,” 
Biometrika, 24, 428. 

Fisher, R. A. (1915), “ The frequency distribution of the values of the correlation coefficient 
in samples from an indefinitely large population,” Biometrika, 10, 507. 

-- (1924), “ On a distribution yielding the error functions of several well-known statistics,” 

Proc. International 3Iaih, Congress at TorontOy 805. 

- (1925), “Applications of 'Student’s’ distribution,” Metrony 5, No. 3, 90. 

Geary, R. C. (1930), “ The frequency distribution of the quotient of two normal variables,” 
Jour. Roy. Statist. Soc.y 93, 442. 

HaU, P. (1927), “ The distribution of means for samples of size N drawn from a population 
in which the variate takes values between 0 and 1, all such values being equally 
probable,” Biometrikay 19, 240. 



EXERCISES 


261 


Irwin, J. O. (1927), *' On the frequency-distribution of the means of samples from a popula¬ 
tion having any law of frequency with finite moments," BiometHka, 19, 226, 
and (1929), 21, 431. 

- (1930), “ On the frequency-distribution of the means of samples from populations 

of certain of Pearson’s types,” Metrm, 7, No. 4, 51. 

KuUback, S. (1934), “ An application of characteristic frmotions to the distribution problem 
of statistics,” Ann. Math. Statist., 5, 263. 

- (1935), “ On samples from a multivariate normal population,” Ann. Math. Statist., 

6, 203. 

Nicholson, C. (1941), ” A geometrical analysis of the frequency distribution of the ratio 
between two variables,” Biometrika, 32, 16. 

Pearson, Karl (1900), ” On the criterion that a given system of deviations from the probable 
... is such that it can be reasonably supposed to have arisen from random 
sampling,” Phil. Mag., 50, 167. 

“ Student ” (1908), “ The probable error of a mean,” Biometrika, 6, 1. 

Uspensky, J. V. (1937), Introduction to Mathematical Probability, McGraw-Hill, New York 
and London. 


EXERCISES 

10.1. Derive by the method of characteristic functions the expression for the sampling 
distribution of the mean of samples from the population 

dx 


dF== 


— 00 < a: < 00 . 


?i(l -f- x^)' 

10.2. Show that the distribution of the geometric mean g in samples of n from the 
Type III population 

dF = dx 0 < a: < 00 

Hp) 


IS 


dF = _ Vf- l)n+n,+ir 1 

r(n) {Fip) ^ ^ IdP-^ Fit + 1)J,. 




(Kullback, 1934.) 

10.3. Show that the difference of two values drawn at random from the Poisson 

population whose general term is distributed in the form whose general term is 

lrf(2A), where d can take all integral values from — oo to oo and T^{2X) is Bessel’s 
modified function of the first kind of order d and argument 2L (Cf. Example 4,5.) 

(Irwin, 1937, Jour. Roy. Statist. Soc., 100, 415.) 

10.4, Show that the distribution of the mean of samples of n from the Type II 
population 

dF oc dx p>0, 0<a;<l 

is given by 

1 


n —2 /*oo 

/{x)=|7r-a {r(p)}»j 


where J,{z) is the Bessel coefficient of order r in z. 


cos (nx/S) djff. 


(Irwin, 1927.) 



252 


EXACT SAMPLING DISTRIBUTIONS 


10^. Show that the distribution of the geometric mean of n variables, one from eadb 
of the populations with frequency functions 


c“* 


*c-* 


iB the same as the distribution of the arithmetic mean of w independent variables distributed 
in the first of these forms. 

(KuUback, 1934.) 

10.6* Show that the difference of two independent variates, 2 , each of which is dis¬ 
tributed in the Type III form 


has the frequency function 


dF = dx 


J-»(l 

2p~l 

2-T-r(p)r(i) * 


where K^{x) is the Bessel function of second order and imaginary argument. 

(K. Pearson, Stouffer and David, 1932, Biometrika^ 24, 293.) 

10a7. If a frequency function is given as the sum of a number of terms of the Type A 


\a 


show that the sum S of n independent variates has a frequency function 
/(S) - .(S)|i + + ... 


where £ = a\^n and 

Af = i:-—---a,- . . . 

the summation being taken over all values of the for which 

3^3 + 4^4 + . . . + kv^ j. 

(Baker, 1930, Ami. Math, Statist,, I, 199.) 

10.8. A theorem of Cramer’s (1937) states that if two independent variables, and 
Xtt with finite mean values, distribution functions and and characteristic functions 

^ 1 , ^3 are such that i^*(0) = 0, so that a*, is non-negative, and f converges, then 

J1 I t 

Xi 

the distribution function of v — — is given by 

X 9 






_ dt 




EXERCISES 


253 


and the ireqaency function, if it exists, by 


f{v) = 


If- 


tv) dL 


^7tij -.06 

Use this result to obtain the distributions of Examples 10.17 and 10.18. 

10*9 • Show that the ratio of two independent normal variables has frequency function 


M 


mjcrf + niialv 


exp 


i 


(mi — 

J 


V(2n) {al + alv^)^ 

where mi, Oi are the mean and standard deviation of the first variate, m^, cr, those of the 
second variate, and it is assumed that m^ is so large compared with a, that the range of 
the second variate is effectively positive. 

Hence show that is normally distributed about zero mean with unit 

(erf + 

variance, 

(Geary, 1930.) 

10.10. Show that the ratio of two independent variables distributed as 
dF X 0 <iwi < a: < x 

dF X — m^Y*~^^dXy (Xim^ < a; < x 

has a frequency function 






(Px - 1)! 


1 + 


y^v 

Vi 


_ (pi — _ 


+ 


(P\ - '‘^\(^^\pt{pt + 

V 2 J\ylJ v.,A»>.+2 • 


ytj 


+ 


e^.p, 

{pi — 1)! Yi' 





(”■ r 'm 


(Pl + l)fP.-2 




+ 


where ^ ~ ttii — m^v. (This includes Fisher’s z-distribution as a particular case.) 


Xi 

10.11. Show that the ratio of two variates v — —y where Xi is distributed normally 

x*i 

with mean and variance and the second like a standard deviation in normal samples, 
i.e. with distribution function given by 

dF X ds 0 < m 2 <s < oo 

has a frequency function given by 




. .'{<r(p+i) ^ ^ +^4i) 


where f = mi — m,?;. 

(This includes “ Student’s ” distribution as a particular case.) 



CHAPTER 11 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 

11.1. In the previous chapter we have considered methods of deriving sampKng 
distributions in an exact form when the parent population is completely specified. Those 
methods are not applicable when the parent is not completely known, and they may in 
any case lead to results which are difficult to apply in practice, e.g. by yielding an integral 
which has not been tabulated. In such cases we can frequently deal with the problem 
by finding approximate forms for the sampling distribution, particularly by ascertaimng 
its lower moments and then fitting a tractable type of curve such as one of the Pearson 
class. 

A procedure of this kind has, in fact, already been considered in C]^apter 9, wherein 
it was seen that approximate expressions could be derived for the first and second moments 
of sampling distributions in terms of the lower moments of the parent. When the sampling 
distribution tends to normality this, in effect, solves our problem, for the first and second 
moments determine a normal distribution. The methods of this chapter are really develop¬ 
ments of this idea. We shall discuss exact methods of finding the moments of sampling 
distributions in terms of parent moments. Our results are important not only on their 
own account, but in giving an accurate method of judging the degree of approximation 
of the expressions for large n discussed in Chapter 9. In particular we shall be able to take 
up some points which had to be loft on one side in that chapter—e.g. the rapidity with 
which some functions of the moments such as y/bi approach normality. 

11.2. It is as well to recall that there are three different types of moment concerned 
in the investigation : (a) the moments of the parent population, (b) the moments of the 
sample and (c) the moments of the sampling distribution. Tliey will be referred to as 
parent-moments (parameters), sample-moments (moment-statistics) and sampling-moments 
respectively. Similarly we shall consider parent-cumulants, sample-cumulants and 
sampJing-cumulants. 

11.3* In Chapter 9 we obtained the exact results 

E{m,) = fi!, 

var (m;) = E{m, - /^;)2 = ^ 

and noted that formulae for sampling moments about the mean were more difficult to obtain. * 
Although we shall later reject this approach in favour of another, it is instructive to consider 
what happens if we try to generalize the procedure of that chapter to our present problem. 
Suppose, for example, we are interested in the sampling distribution of the variance. The 
above equations give us the first two sampling moments of the second moment about an 
arbitrary point. For the first sample moment of the variance we have 

E(m,) = E^S{x^) - {7/(^)}‘] 

264 a 


. ( 11 . 1 ) 



APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


255 


n w* 

^ 1 /II o\ 

—-.(1L2) 

n 

This is exact and may be compared with the approximate expression given by the methods 
of Chapter 9, viz.: 

= /^ 8 * •••••• (11.3) 

We might then proceed to find the second, third . . . sampling moments of the variance 
and thus obtain more and more information about its sampling distribution. For example, 
we have for the fourth moment 

E{m*) = E]^h(x*) - 

+ . .(11.4) 

We can then find the expectations of the individual terms by an easy extension of the 
method already used. We express any power in terms of products of the type 
when j^k the mean value of such a product, the x'& being independent, 

is n{n — 1) . . . (w — < + * • • /^p* Without loss of generality we may take our 

origin at the mean of the parent, so that = 0 and other moments are those about the 
mean of the parent. The rest is mere algebra. For example, for the first term in (11.4) 
wo have 

= (X? + + . . . xj)* 

= 4- 4i:x/xfc® 4- erx/x*,^* 4- ZExj*x^^ 4- IXf^Xj^x^xJ . (11.5) 

The numerical coefficients require a little watching. That of Xj^Xi^^, for example, is 3, not 
6 as in the multinomial expansion of + . . . because j and k can be interchanged. 
The mean value of (11.5) is then 

/^8 + 4n(?i — l)//6iM2 + — 1)(^ — 2)^4//| + 3w(w — 1)//J + n{n ~ l)(7i ■— 2){n — 3)/*2. 

A similar evaluation of the other terms in (11.4) loads eventually to the result 
3.1 

E{m\) = -- (jM. — jwi)® + --3(|Me — — 24/85//, — I5[il 4- 4- 96/8|/i, — 30//^) 

Tv 71 

-•“ 40/8,//, ~ 96/85/^1 — 54/8| 4- 336/84/8^ 4- 528/83/8, — 306/8^) 

Th 

+ — 176/i5//3 — 102/<4 .+ 924/^4/i| + 1232//^/^, — 1044/^2) 

- --(4/8, - 88/8,/8, - 160/8./8, - 95/«| 4- 1050/84//! + 1360/8!//, - 1395//!) 

t2»” 

+ — 28/46/iji — 56//6//8 — 36/i| + 420/i4/^2 + — 630/^2) • (11.6) 

Tl* 

11.4. Systematic investigations of the sampling moments on these lines (though by 



256 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


a somewhat different method) were carried out by Tschuprow (1919) and, for the particular 
case of the variance, by Church (1925), who corrected some misprints in Tsohuprow’s results* 
Unfortunately the resisting formulae are exceedingly complicated—^the above is one of 
the simpler cases—^and are obviously unsuitable for practical work. 

It then began to be appreciated that their complexity might be due to the us© of a 
special type of symmetric function of the observations, namely the moments, and the 
question arose whether other functions might have simpler properties. Thiel© had already 
introduced the parameters which are now known as cumulants, and had defined some 
statistics which were the same functions of the moment-statistics as the cumulants are 
of the moments. He also gave some expressions for the sampling cumulants of these 
functions. In 1928 C. C. Craig developed this work and gave a number of further results. 
Even these, however, were sufficiently complicated and were reached only after some 
labour, and Craig himself remarked that “ it rather seems that the best hopes of effectively 
further simplifying the problem of sampling for statistical characteristics lie either in the 
discovery of a new kind of symmetric functions of all the observations ... or in the 
abandonment of the method of characterizing frequency functions by symmetric functions 
of the observations altogether.’’ 

About the same time R. A. Fisher discovered such a new kind of symmetric function, 
the ifc-statistics, and his remarkable paper of 1928 forms the basis of nearly all subsequent 
wt)rk on the subject. The new statistics have the valuable property of yielding particularly 
simple sampling formulae which can be obtained directly by combinatorial methods, obviating 
most of the algebraic labour inherent in the older methods. 

Seminvariant Statistics 

11.5* It will be observed that equation (11.6) does not contain the parent-mean 
In deriving it we took an arbitrary mean at the parent mean, which simplified the algebra 
to some extent. The independence of E{m\) of this parent mean is, however, not due to 
this accidental circumstance. In fact any transformation of the variate from one origin 
to another leaves unchanged, for m* = E{x — mi)^ and the transformation increases 
each X and m\ by the same amount, leaving their difference unaffected. (>;nsequently 
if m, is independent of the location of the origin, so must be its sampling moments. Thus 
our sampling formula© are very much simplified if we use statistics which are independent 
of the origin. In equation (11.6) there are terms corresponding to and /ij. 

If we had to take account of possible terms in fjLi there would be additional terms such as 
on, our formula containing 22 types of terra instead of only 5. 

11.6. A statistic which is independent of the origin of calculation is said to be semin- 
variant. The moment-statistics about the mean are seminvariant. We now consider a 
second family of statistics (p ^ 1, 2, . . .), symmetric polynomials in the observations 
such that the mean value of kj, is the pth cumulant, i.e. 

Eikp) =Kj, .(11.7) 

Note first of all that is uniquely determined by this definition ; for if there were 
two functions and k'^ obeying (11.7) their difference kp — kp would have a zero mean 
value. But this difference is itself a symmetric function and can therefore be expressed 
as the sum of terms Zx^, Zxj etc., and hence its mean value is a series of terms each 
of which is a product of moments. The vanishing of this series would imply a relationship 
among the moments which is impossible except perhaps for particular parent populations. 
Hence kp — kp must vanish identically and thus kp — k'p. 



SEMINVARIANT STATISTICS 


257 


Secondly, note that the ¥b are in fact Beminvariant, except for ki which is equal to 
the mean iti^. In-fact, we have by Taylor’s theorem 

h 

kp{Xi +h,Xt + h, . . , x^ + h) = kpixi, . *„) + j^pkp(xu . «„) 

^2 

“f- ^p^kp^Xi, x^, . • • XJ^) "i- • . , • (11.8) 

where , 

^ aa;, ax. ' * * dx^’ V 

Taking mean values, and remembering that itself is independent of the origin, except 
for Ki, we have 

h 

Kp « Kp -f- -:^E{iyic^ etc. • • • • (11.9) 

Thus E(DlCj^) and other terms on the right vanish separately, for (11.9) is an identity in h. 
In virtue of the remark above, this implies that Dkp = 0, D^kp == 0, and so on ; and hence, 
from (11.8), 

“f" ^ kp{Xly Xfy • • • 

i.e. kp is seminvariant. The exception to this rule is ki wliich has as its mean value #ci = mj 
and thus 

i-j = ^i:(x) .(11.10) 


11.7. We now proceed to find explicit expressions for the ifc-statistics in terms of the 
observations Xi . . • By definition kp is of degree p in these observations (for Kp is 
of order p in the moments, that is, the sum of the orders of the moments comprising any 
term in is p). We may then write 




X.) 








P» 


4-7f 




3 >/«) ( 11 . 11 ) 


where the second summation extends over all the ways of assigning the ni + . . , 
subscripts (including permutations) from the n available and the first summation extends 
over all partitions of the number jp, {Pi^p{' • . . is a number 

depending on the partition. 

We have 

Pi^i + P%n^ + . . . +p^s . . . . ( 11 . 12 ) 


and define p by 


7ti -f~ ^2 • *4" ^ P* 


. (11.13) 


On taking mean values of (11.11) we have, since the x*s are independent, 


•Cp • • • f^p.’'’)AB} .(11.14) 

where B is the number of ways of picking out the p subscripts from n, permutations allowed, 
and is therefore equal to n(w — 1) . . . («—p + l) = 

Now from equation (3.31), we have 


fdA”' (- ^r~Hp - 1 )! 

\p,\) TTi! ... 71,! ’ 


. (11.15) 


A.S.—VOL. I. 


S 



258 


APPROXIMATIONS TO SAMPLING DISTRIBX7TIONS 


the summation extending over all partitions subject to (11.12) and (11.13). On identifying 
corresponding terms in (11.14) and (11.15) we find the values of the A’s and on substituting 
in (11.11) obtain finally 


p!i:(- i)>- ^(p -1)! 


Xj^‘ 




iPi'T' 






(11.16) 


the explicit expression of kp in terms of the ar’s. 

We may notice an important simplification of this expression which is crucial in a 
discussion of the sampling properties of the k's. Apart from factors in p and n a typical 
term in (11.16) may be written 


xM 1 

where, it is to be remembered, permutations of the subscripts are allowed. There will 
be a term of this type corresponding to every partition of p into ti's and of p into p’s. 
Consequently we may write 

~^ • • • ( 11 . 17 ) 


where there is a term in the second summation corresponding to every possible way of 
assigning the subscripts. In this assignment subscripts are regarded as distinct entities. 
For example, if from the n subscripts we choose pi to be 1, pi to be 2, . . . pa to be Tti + 1, 
and so on, there will be as many dififerent terms as there are ways of choosing p^ from 
the Ts, and so on, i.e. 

p! 

(?>')"■• .T(K!r<^.!V.'. V. 

In fact, (11.16) is a condensed form of (11.17) in which all the terms leading to the same 
a;-product are added together, their number being given by (11.18). 


Expression of k'-Siatistics in terms of Symmetric Products and Sums 
11.8. Writing 

[Pl”‘p/* . . . p/-] == . . . Xi^>) i ^ . :p£l . . (11.19) 

so that, for instance, 

[ 21 ] ^Z(x;^x^) 

we see that the mean value of . . . jo/*] is We can then write down 

the k's in terms of the symmetric product sums [p"] at once from the expressions of cumulants 
in terms of moments. For instance, from (3.33) we have «, = pj, — Spapj + and 
hence 

/- - t?] _ 4. 313 

» tj _ _2[P]_ 

n n{n — 1) ^ n(n — l)(n — 2) 


. ( 11 . 20 ) 



Jk-STATISTICS IN TERMS OP SYMMETRIC PRODUCTS AND SUMS 259 


a result which, of course, can be obtained directly from (11,16). In fact, there are three 
jmrtitions of 3, (3), (21), and (1*). Prom (11,16) we then have 

p _ 1 ^![3] (-“ l)i!3![21] 1)^2!3![13] 

’ (3!)n! n(n- l)2!l!iril n(n i)(rri-” 2)(i!)83! 

^13] ^ +__ 

n n\n — 1) n{n — l)(n — 2) 

as before. 

It is, however, more useful for practical calculation of the ^-statistics to express them 
An terms of the power sums defined by 

^,.=-2V). .(11.21) 

This can be done by expressing the product sums (11.19) in terms of power sums (a pro¬ 
cedure which may be facilitated by the use of tables of symmetric functions) or directly 
as follows :— 

Assume 

ka = a^Sa + “f 

Since E{ka) = k-j == we have 

Ilia -- a^E(Sa) + diEis^Sa) + aaE(s^). 

Hence, for moments about an arbitrary point 

//j - = ao(n^';f) H atin/Z^ + «(« - 

-I- a^inf/y -1- 3n(n - •+ «(» — !)(« — 

from which we find, identifying coefficients, 


1 = )i(a„ + Oi + as) 

— 3 = n(n — 1)(«, + ;tes) 

2 ^ v{n — l){w — 2)o, 

whence, solving for a„, a, and fls, we find 


11.9. 


The first eight ^*-statistics in terms of the power sums are as follows:— 



~ ■*(”* + w)5,«, — 3(m® — n)sj + 12w5,5f — 6sf} 

** “ — 5(w* + 5n*)s4«i — 10(n* — «*)«,«, + 20(n* + 2n)s^ 

4- 30(n* — n)«|ai — 60n«,«J + 248f} 


( 11 . 22 ) 



260 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 




ifc, = + 11»* - 4n®)«. - 6{n« + 16n* + 11»» - 4n)«,«i 

•— 16(n‘ — 4n* — »* + 4»)«4«, — 10(n* — 2n* + — 4»)«| 

+ 30{n* + 9n* + 2 m)«««* + 120{»* — n)«»iSiSi + 30(n* — 3n* + 2n)«| 

— 120{m* + 3n)«,af — 270(n* — + 360ws,sf — 120«f} 

jfc, =* + 42n» + HSw* - 42n»)a, - 7(n» + 42»*'+ 119n» - 42n*)«,s, 

— 21(n^ + 12n* — 31n* + 35(n* + 

+ 42(71^ + 27n* + 44n^ — 12n)8iSf + 210(n* + 

+ 140{n* + 5^® — + 210(n* — 3n^ + 2^2)534 

— 210(n* + 13n* + 6n)$4t9j — 1260(n* ^ ^ 2n)8^s^\ 

— 630(w* — 3?i* + 2n)525i + 840(n® + 4w)«8sJ + 2520(n* — 

— 252()n«a«f + 720«i} 


^[7J 








{{n 


,7 + 99;i6 ^ 757^.6 + 14X^4 _ 39g^8 + I20r^a)58 ^ 8(n« + 99>t«+757w« 
+ 141ti« - 39871* + 1207fc)^7.<?i 28{no + 377i*^ - 39?^^ - 15771* 

+ 2787i* — 1207i)58.v, — 56(71* + 97?.* — 23n* -j- 11177* — 2187i* + 1207^)^^^a 

— 35(71* + 71* + 3371* - 12171* + 20671* -- 120704 + 56(7i* + 687?.* + 3597?.* 
871* — 607l)«a4 +* 336(71* + 2371* — 3171* — 2371* + 3077),V5^85i 

4“ 560(71* + 571* + 571* + 57?.* -* %n)s^8iSx + 420(71* + 277* — 2577* 

+ 467?.* — 2477)^44 + 560(77* — 477* + 1177* — 2077* + 127?)4^2 

— 336(77* + 3877* + 9977* - 1877)^*4 - 2520(77* + 1077* - 1777*'+ 677)54^*4 

^ 1680(77* + 277* + 777* — 1077)44 5040(77* — 277* — 77* + 277)<53,S'2^l 

— 630(7?.* — 677* + 1177* — 677)4 + 1680(77* + 1777* + 1277).944 
+ 13,440(77* + 277* — 377)^3524 + 10,080(77* — 377* + 2 n ) sls \ 

-- 6720(77* + 577)^34 25,200(77* - n)sls\ + 20,16077^24 5040^J} 

In particular, we have 
hi = m[ 




h^ 


Mm 


(n — 1)(77 — 2) 


.77?.a 




(n - l)(n - 2){n 3) 

expressing the k'& in terms of the moment statistics. 


((t? + 1)7774 — 3(77 — l)mS} 


( 11 . 22 ) 


. (11.23) 


11.10. There is a well-known theorem of symmetric functions which states that any 
rational integral algebraic symmetric function of can be expressed uniquely, 

rationally, integrally and algebraically in terms of the symmetric sums s^,. It can thus be 
60 expressed in terms of the k's, for from equations such as (11.22) the s*8 can be so expressed 
in terms of the i’s. Thus an investigation of the sampling constants of any symmetric 
function expressible in terms of rational integral algebraic symmetric functions can be 
translated into an investigation concerning the i’s. 

To round off this account of the relationship between the k'& and the s'b we may refer 
to two interesting operational properties. Write for the same function of the differential 



SAMPLING CUMXJLANTS OF ife-STATISTICS 


261 


operaton ^ as ip is of the x’b and Sp for the same function of the operators as 

«P ^ 'of the x’b. Thra 




.p\ 


0} • 


where {p 


1 • 


■^p(*p,^p, • • • *p«) 

. is any partition of p other than p itself; and 


^pip 

S,kp 


= p! 

0 . 


q j^p 


(11.24) 


(11.26) 


Methods of proof and applications of these results are given in the exercises at the 
end of the chapter. 


Sampling Cumulants of k-Statistica 

11.11. The problem of determining the sampling moments or the sampling cumulants 
of i-statistics is that of finding mean values of powers and products of those statistics. 
To any number a with partition (a^' a,®* . . . af") there will correspond a moment 

. . . a/0 = • • • V') • • • .(11.26) 

and a cumulant k . . . a/*) related to the moments by the identity (cf. equation (3.64)) 


2:|K(a,®> . . . = log |2>(6 


0x 


”• Vi! 


4/. 


;} • 


(11.27) 


For example, the fourth cumulant of kg, will correspond to the fourth moment of ia> 
which is the mean value of k\. These quantities will be VTitten k{2^) and //(2^), in accord¬ 
ance with (11,26), Again the cumulant k:( 32) corresponds to the moment /^(32), the mean 
value of kjc 2 , or their covariance in their joint sampling distribution. Generally, in the 
simultaneous distribution of the there will be a separate formula of degree a for every 
partition of a. 

Now the product . . . k^J^• is homogeneous and of total degree a in the a:’s. 
Hence, when mean values are taken . . . a/*) will be homogeneous and of total order 

a in the parent /^/s. Since the k:’s themselves are of homogeneous order in the ^’s it follows 
that . . . a^*) is of homogeneous order in the ks. Hence we get the first rule for 

the sampling of ^j-statistics (which is true of seminvariants generally):— 

Rule 7. consists of the sum of terms each of which, except for con¬ 

stants, is a product of parent ks of order a. 

For instance, t<{2^) is of total order 8 and is therefore the sum of terms in Kg, ^ce/cj, 
/cf, and /f 2 . Similarly /c(32) will contain a term in and one in and no others. 
As seen in the next rule, no terms in ki appear (as again is true of seminvariants generally). 
Rule 2. No term in Kr(ai“‘ . . . a/*) contains ki, except /c(l) itself. 

This follows as in 11,5. The ^-statistics are seminvariant and hence their sampling 
distribution cannot depend on the variable quantity ki. The exception occurs when we 
are dealing with the only statistic which is dependent on the origin, namely Jfci, and here 
ic(l) = /Cl as is evident from the definitions. 


11.12. We now enunciate and illustrate the rules by which the terms in K(ai®‘* . . . a/*) 
can be found. As the proof of the validity of the rules is difficult to grasp until their nature 
has been comprehended we defer a proof until later in the chapter. 



262 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


To find the term in ic^/* , • . in #c(ai*‘ 


n, ai 


. a/0 consider the two-way array 

h 

b. 


.( 11 . 28 ) 


where there is a row corresponding to every k in the term and a column 

corresponding to every part in . . . a/*). Consider the various ways in which the 

body of the table can be completed by the insertion of numbers whose row and column 
sums are the respective b and a numbers ; e.g. if we are seeking the coefficient of kqk^ in 
k( 4®2) we shall consider such arrays as 


. (11.29) 


2 

2 

2 

6 

2 

3 

1 1 

6 

3 

3 


6 



1 

1 


2 

1 

1 

. j 

2 

1 


1 

2 



1 

1 

• 

2 

1 

• 

1 i 

1 

2 

« 

1 

] 

2 

• 

• 

4 

4 

2 1 

10 

4 

4 

2 

10 

4 

4 

2 

10 




Then the rules by which these arrays give the coefficients of are as follows : 

Rule 3. Every array in which the numbers in the body of the array fall into two or 
more blocks, each confined to separatee rows or columns, is to be ignored. 

For instance, in the foregoing example 


4 2 

. 2 


4 4 


1 10 


is to be ignored, since the 2x2 block in the top left-hand corner has no row or column 
number in common with the entry in the bottom right-hand corner. 


Rule 4. Subject to the jgnoration of terms enjoined by Rule 3, to the coefficient of 
in there will be a contribution corresponding to each way of 

completing the array (11.28). Such of these as do not vanish are composed of a numerical 
coefficient multiplied by a function of w. 

Rule 5, The numerical coefficient is the number of ways in which the column totals, 
considered as composed of distinct individuals, can be allocated to form the array concerned, 
divided by /?,! . , . 

Rule 6, The function of n, called the pattern function, depends only on the configura¬ 
tion of zeros in the array, not on the actual numbers composing it or on the row and column 
totals. The function is given by considering the separations of the rows into distinct 
groups or separates. 



SAMPLING CUMULANT^ OP ^-STATISTICS 


263 


(i) With one separate there is associated the number n, with two separates 

n{n, — 1) , with q separates n{n — 1) ... (n — y + 1). 

(ii) In each separation we count the number of separates in which a particular column 

is represented by a non-zero entry. If in p sepax«ites, we assign the factor 

^ 1 )! _ 

7i(n — 1) . . . (ri ~ p + 1)‘ 

(iii) This is done for each column. 

(iv) The various factors given by (ii) and (iii) are multiplied together for each separation. 

multiplied by the factor appropriate under (i) and the results summed to give 
the pattern function. 

RuU 7. Any array containing a row which consists of a single non-zero entry has 
a vanishing pattern function and is to be ignored. 

Rule 8. Any array containing a column which consists of a single non-zero entry 

has a pattern function - times that of the array obtained by omitting that column. 

7h 

Rule 9. Any array the non-zero elements of which consist of two groups connected 
only by a single column has a vanishing pattern function and is to be ignored. 


Example IIJ 

As an illustration of these rules (which are not as difficult as they look), suppose we 
seek for the coefficient of kqkI in k:( 4®2). If the reader will write down the thirty or so possible 
arrays with column totals 4, 4, 2 and row totals 6, 2, 2, he will find that the only ones wliich 
do not vanish are those of (11.29) and permutations of rows and columns with the same 
sum, namely 


2 

2 2 

i 6 


2 

3 1 


6 

3 

2 1 1 

(\ 


2 

3 1 

6 

1 

1 . 

; 2 


1 

1 . 


2 

1 

1 . 1 

2 


1 

. 1 

2 

1 

1 . 

2 


1 

, 1 


2 

• 

1 1 

2 


1 

1 , 

2 

4 

4 2 

10 


4 

4 2 


10 

4 

4 2 ' 

10 


4 

4 2 1 

10 


(«) 




{(>) 




(c) 




id) 



3 

2 

1 

6 


3 

3 . 

1 


3 

3 

. i 

6 



, 

1 

1 

2 


1 

. 1 

! 2 


, 

1 

1 

0 



1 

1 

• 

2 


• 

1 1 

1 2 


1 

• 

1 1 

2 • 

» 


4 

4 

2 1 

10 


4 

4 2 

10 


4 

4 

”2 '' 

10 



(e) if) ig) 


With practice the reader will find it unnecessary to write down arrays such as (c), (d) 
and (c), which are merely obtained from (6) by permuting rows and columns, but for clarity 
at tliis stage they have been set out in full. There is one trap here to be particularly 
noticed. In array (b) the two columns summing to 4 and the two rows summing to 2 are 
different, and their permutations result in 4 different arrays. But in array (/), though 
the rows and columns are different, there are only 2 different arrays. 

Each of these arrays contributes to the coefficient required. Consider first of all that 

from (a). The numerical coefficient is ~ factor in 

brackets is the number of ways of allocating 4 individuals in the partition 2, 1, 1, similarly 



264 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 

for the seooiid, and we divide by 2! since there are 2 members of the row totals the same, 
this being the only j3 factor. 

Under Rule 8, the pattern function is ^ times that of 


X X 
X X 
X X 

There are five separations of this, one of one separate, three of two separates and one of 
three separates. The contributions respectively under Rule 6 will be found to be 


\n)\nl n 


''(n(n - !)}{»(» - 1)} 

„c n(„ .'f U (- i)‘a! 

^ \n{n — l)(n — — l)(/i — 2)J n{n — l){n —- 2) 


n{n — 1 ) 


The sum of these is 
72 


- ^and hence the contribution from array (a) in (11.30) is 

(n - l)(n - 2) j V ; V / 


(n - l)(w - 2j* 

Now for arrays (6) to (e), which have all the same numerical factor and the same pattern 
function and can therefore be considered together. For any one the numerical factor is 


’ 4! Y 4! V 2! \ 1 
2!1!1!A3!1!Ai!17 2! 


48 


and that of the four together is thus 192. 

Under Rule 6 the pattern function will depend on the configuration 

XXX 
X X 
X • X 

where x stands for a non-zero entry and a period for a zero entry. There are five separa¬ 
tions of this, one of one separate, three of two separates, and one of throe separates. The 
contribution from the first is 

111 1 

n -— „ 

n n n 

for each column has a non-zero entry in the separate. The contribution from the three 
separations given respectively by isolating the first, second and third row will be found to be 

n(n ^ 4- -. i _-I* i _ 1 == ^ ^ 

— 1)* n\n — 1)^ — ll^J n^{n — 1)^* 

The contribution from the separation of three separates is 


- *'<» - 


2 ! 


1 - 


I)(» — 2) n{n — 1) n{n 


1_ " 
- 1 ). 


n^n - 1)» 


Tb® pattern function is the sum of these three contributions and is thus 


(n ~ 1)»- 




SAMPLING CUMDLANTS OF jfc-STATISTICS 


266 


32 

The contribution from arrays (/) and (g) in (11.30) will be found to be 

{n — 1)* 

Hence, adding'all the contributions together, we find that the coefficient of in 
k( 4*2) is 


73 192 32 _ 8(37n - 66) 

(« - l)(n - 2) ■^ (» - 1)* - 1)* (» - l)*(n - 2)’ 

as shown in equation (11.62) below. 


11.13. Mule 10. The expression for any K{a,“> . . .) which contains a unit pert may 
be obtained from that without the part by (1) dividing throughout by n and (2) increasing 
the suffix of one of the k’s by unity in every possible way. 

For example, it may be shown that 

k{2^) = + -M_ 

n n — 1 


Hence 


and 80 on. 


K(2n) 




+ 


n{n —-I) 




+ 



_, 

1 ) n^{n - !)• 


11 A4:. The reader may be inclined to doubt whether this rather elaborate com¬ 
binatorial procedure represents much of an advance on the straightforward algebraical 
approach considered earlier in the chapter. A few trials of the two methods in particular 
cases will soon convert him to the former. The division of the coefficients into a numerical 
factor and a pattern function greatly simplifies the method and in fact all the functions 
likely to be required for practical purposes have been tabulated by Fisher (1928) or can 
be derived therefrom by an iterative process given by Fisher and Wishart (cf. Exercise 11.11). 


Example 11.2 

To find the variance of the second moment statistic ma. 
From (11.23) we have 


h 



im.. 


Hence 


var m 3 





»c(2*) consists of two terms, one in and one in The only array contributing to the first is 




266 


APPROXIMATIONS TO SAMPLING DIS'IRIBUTIONS 


with a numerical factor unity and a pattern function K The arrays giving the second 
are of type 

2 

2 


2 2 ' 4 


If any entry in this were a 2 the row in which it appeared would contain only a single 
entry and hence the array would vanish. The only contributing array is therefore 


1 I i 2 
1 1 2 


2 2 ' 4 


The numerical coefficient is 


r-Hence 

(n - 1) 


k(2*) 



The pattern function will be found to be 


var nii 



+ ^^ 4 ) 



As n becomes large this result tends to 

- / 4 ). 


confirming the approximation given by equation (9.9). 


Example 11.3 

To find the third moment of we require k{2^). This will be the sum of factors in 
'c® and k \. 

The coefficient of the first is For the second we have to consider the array 


1 1 
1 1 


*> 


4 

2 


2 2 2 I 6 


all others vanishing except the two equivalent partitions obtained when the column with 
the single entry appears in the first or second place. The numerical factor is then 


3 



2 


12 . 



SAMPUNG CUMULAOTS OF i-STATISTICS 


267 


Tbfi |>att 9 m function is ~ times that of 


X 

X 


X 

X 


i.e. is 


The coefficient of it^Kt is then 


12 


n(n 1) n(n — 1) 

For the term in k| the only contributory array is 


with a factor 


/ 2! y 1 

\i\i\) *2! 


2 2 2 I 6 

4 and pattern function 


n - 2 
n(n — 1)** 

For the last term we have to consider the array 


1 

1 


2 

1 


1 

2 

• 

1 

1 

2 

2 

2 

2 

6 


with a numerical coeflScient 8 and a pattern function 


we get 


(n -- 1)*‘ 


Collecting terms together 


K« 12/c4/ca , 4(n — 2) 




>4 + 


(n-l)**- 

This is also the value of the third moment //(2®) measured about the mean of the sampling 

HKo 


distribution k^. We see that if the parent is normal the third moment reduces to 
i,e. is of order indicating a rapid tendency towards symmetry. 


Example 11A 


Few things illustrate the usefulness of expressing the formulae in terms of cumulants 
and the power of the combinatorial method better than the simplification imported when 
the parent population is normal. In this case only terms in k* survive, all higher cumulants 
vanishing. 

As an illustration let us prove that K{pq) = 0 for normal samples unless p ^ q. 

The only term which can appear in K(pq) is and evidently, if y -1- j is odd, even 

this cannot do so. If p -f j is even we have to consider the array 


2 

2 

2 


I 


P <I 


p + q 


% 




268 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


Now if any entry in this array is 2 the array vanishes since the row concerned will contain 
only one entry. The reverse can only happen if all the entries are unity, in which case 
the sums p and q must be equal. This establishes the result. 


Example 11.5 

Any k(Oi*‘ . . . o,“*) containing n parts is of order For example, »c(3*2*) 

is of order »“*. 

To prove this result we have to consider only the pattern function. Consider the array 



a 

Ox Ox ... Og 

a 


-(»-« 


To the single separate there corresponds under Rule 6 the function ^ 

Furthermore, no pattern function can be of greater order in n ; for in an array with more 


than one row, with q separates there is associated the factor —t-t'-At . . . -r-r-, where 

Pi is the number of separates in the first column containing a non-zero entry, and so on. 
If there is more than one entry in the jth column the factor must be at least of order 
n* and thus the pattern function of order less than 71 ; and if there is only one entry 

in each column the order must be unless the function vanishes. Hence the result. 


11.15. By the above methods Professor Fisher worked out the sampling formula© 
for degree not greater than 10, and gave some of the 12th degree. The following are the 
results, with a number of corrections. 


Second k-Statistic 


K(2a) = '1* + 

n n —1 


k(2*) 

k(2«) 


, ^ 4(w 

n{n — 1) ^ n{n ■ 




2 ) . 


xl + 


1)® “ ' {n 
32(n - 2) 


1) 


,2 


1 ) 




n^n — 1)^ 


n\n - 1)3 


+ 


144 


n(7i 


.2 L - 2 ) „2 


iT*'"*’ »(» 


1)’ 


kIk 2 + 


48 


(n ^ 1)3 


- T A kI 


k(2*) ='^ + —- 

n^n — 1) 


40/Cg/<rj 4- Z 

n\7i 1)3^’^* n3(n 


\2n + 9) 


U(n -- 2)(6n3 - I2n + 7) , 

7i^n — 1 )* n^7i 


480 


, 320(47^2 _ 9n -f 6) ^ 

H -TtT- X^Ki 


w»(w - 1)* 
1920(«-2) 

+ + (n 


384 


1)‘ 


1)» 

480(2w® - 7w + 6) 
n^(n - 1)* 

^ • • 


1 )® 

X,x^ + 


XtXt 


1280(w 




x^xS + 


1920 


n(n — 1) 




(11.31) 

(11.32) 

(11.33) 


- 2 ) 


(11.34) 




SAMPLINa CUMULANTS OF i-STATISTICS 


2d9 


*c(2«) 


I 


j«it + 


60 


n*(f» 


160(n-?) 240(2«» 

T)'^^*** + »*(«—I)*''*** +- 


6n -f 4) 

—r: - 

n^(n — 1)* 


.. [p — If n^\n — x;“ — i;* 

, 96(»-&)(7w»-14w +9) , 4(113 w*-520»»4-9SO»«-800w+26 5) j 

»*(» — 1)* »*(» — i)* 

, 1200 , 4800(n - 2) , 2400(5»i» - 12» + 9) 

' n»(» - n»(n - 1 )* »»(n - 1 )« 

160(w - 2)( 3 1w - 6 3) ij 960(w - 2){0?i* - 12» + 7) , 
n\n — 1 )* »®(» — 1 )* 

1 920(re - 2)(9n ® - 23w + 16) 480(lln» - 41w* + 69« - 31) 

' n*(» — 1 )® n*{n — 1 )* 

4 - ^ 4 . 38400(«-2) 9d00(4n® - + 6 ) 

n*(n — 1 )* * * n\n — 1 )* ‘ * 2 — 1 )® * * 

28800(2n* - In + 6) , 960(»t - 2)(5w - 12), - 

"i TTTT-rn-^4^3^* i :“irr“-“r 


__^ ^ 2 ^ . 960(^ - 2)(5n - 12) 

n*(» - l)s « 8 « + ''* 


38400(n-2) 3 3840^ 

»i7m — il® * * (n — 11 


28800 
n(n — 1 ) 




Third h-Statistic 
k(3^) 


-><»+- 
n n 


9 , 9 

- 7^C^K^ H- 

— 1 n 


k\ 4" :— 
1 " (w 


6n 




1 27 

ic(3») == 

n(n — 1 ) 

, 54(4?i - 7) 

-] - 


1)(7?, - 2 ) 

, 27(3w - 4) , 27(4n - 7) 

+ 4^—i)5 + „(„ _ 1). 

12) . , 36(7w® - 30re + 34) , 


. (11.36) 


+ 


(n - l)*(w 
108 M(r)W 


2 ) 


KjKl 


1 )* 
182(5m 


{n - !')*(» - 2 ) 


XtKjKt + ■ 


{n — iy(n - ' 2 )* 


{n 




- KK® 

l)*(w-2)® ’ - • 

108(2n 

1)'<^”'^‘ + -^2(7f:r'iy2 


. (11.37 


108(7w» - 20w + 16) 

TC*(W — 1)® 


«7'f» + 


27(1 7m« 


«®(w - 1) 


27(17w® - 49ro + 3 .5) 
n\n 

iln + 39) 3 , 27(37w - 70) ^ 




'-xl + 


324(19w® - 67n + 64) 
“ l)»(»- 2 ’^) 


+ 

+ 


n{n 

108(82w» _ 

n{n — l)®(w — 2 )® 
324(7.6?),® - 473»® + 1016?i 


162(66?i» — 245?? + 234) 
x,ic,Kt _ l)a(^ _ 2) 


n(n — !)*(?? — 2 )* 


XtXfKt 


481??® + 958?? - 640) ^ , 108(59??® - 220?? + 224) , 

-rr-TT-:rr::- XfK^ -)--:-rr:r:-r:-?Cti 


756) 


??(?? — 1 )®(?? — 2 ) 


XtKt 


KiK^Kz 


n{n — 1 )®(?? — 2 )® 

27(173??® - 1503??® + 4962??® - 7380?? + 4200) , 

??(?? — !)*(?? — 2 )® 

, 108(71??* - 263?? + 234) ^ , , 648(79??® - 343?? + 378)., 

{n — !)*(?? — 2 )» '^“'^•'^® 


, 486(63??* - 290?? + 362) ^ ^ , 

+ (w _ 1)8 (to _ 2)» 


+ —( r:^ (^ -2)® 

972(99??® - 688 ??» + 1612?? - 1280) 
(?? - !)»(?? - 2)* 



270 


APPROXIMATIONS TO SAMPMNO DISTRIBUTIONS 


162(87»* - 594n* + 1420» - 1176) « 972n(22»* - 103« + 118) 

(n~ !)*(» ~ 2 )» (n - !)»(» *- 2 )» 

648n(103 in« - 510w + 640 * 3 , 648w«( 5w - 12) , 

(» — !)*(» — 2 )* (n — l)*(n — 2 )*^* 


1 C 4 KJ 

. (11.38) 


Fourth k‘Stalistie 

k(4*) = - »c, + - 
n 1 


16 ,48 

-; x»xt + ' - XtKt + 

— 1 n — 1 n 


34 - , 72n . 

kJ + - —-- Kiiq 

— 1 (w — l)(n — 2 ) 


«{4«) 


+ 

+ 


144w 


(n - l)(w - 2 ) 

1 . _ 4 8 

n{n — 1 ) 




24?i(n + 1) 


(rt - 1)(« - 2)(w - 3) 


(11.39) 


, 16(13w ~ 17) , 12(41n - 65) 

KloXt H--r.„- K,Ki + -- 

«(w. — 1 )* n(n — 1 )* 


48(16n - 29)^ ^ 12(37« - 72(1 In - 19) ^ ^3 

n{n — 1 )* ' ’ * n(n — 1 )® * (n — l)*(n — 2 ) * * 

288(19» - 41) 48(203« - 523) 

"T 7-tT®/ -n. 1 / n\XtXtXg 

(n — l)*(n — 2 ) (w •— l)*(w — 2 ) 


+ 

+ 


144(56w* - 257n -f 302) „ , * , 1440(4n - 11) , 


(n - l)*(n - 2 )* 

1152(22«* - 106w + 133) 
(re'-l)*(n“ 2 )* 


+ 


(n — l)*(re — 2 ) 




, 8(709re* - 343()re + 44.56) 

XtKiKt ---ova- X 

(re — l)®(w — 2)* 


288(19«* - 98w* + 125« + 2) , 1728(24w» - 140re* + 200re + 4) 


(re - !)*(« - 2)*(re - 3) 




—-v*'-. 1 —. ' If k'* 4- 

^ (re - l)*(re - 2)*(ft - 3) * ^ ^ 

432(49?)*-287re*+408w + 12) ^ 2 , 864(103re»-629«i‘ + 948re +-24) ^ 

“"(re-l)*(«~2)»(re-3) + (■ 7 r-l)*(w-l)»(re'^)““ 

288(41re<-384re»-f 1209rea-]282«-36) ^ 288«(5.3«2-179re- .52) , 

■ '"(w-l)*(re-^)*(re-3)‘®~ (re^“(re-2)2(re-3) 

1728M(29re8-196M*4-317re+62) , , 172Sre(re-f!)(«“- .5w +-2) ^ 

(re-l)*(re-2)>-~3)* ^ (re-~l)*(re-2)2(re-3)* ^ 


Fifth k-Statistic 


x(5*) 


1 , 25 , 100 , 200 , 125 , 

-'<10 + - . X^Ki -f- K,K, -f KtKt -+-Ifg 

re ?i~l re -1 71 1 re — 1 

, 200re , , 1200« , 850re 

(re — l)(re — 2 )^**^* (re — l)(re — 2 )^'*^*^’ (re — l)(re — 2 )^‘**^* 

1.500w j 600re(re -f 1 ) , 

(re — l)(re — 2 )^*^“ (re — l)(re — 2 )(re — 3 )^‘*^® 

180(>re(w -f 1 ) j ® 120 ««(re + 0 ) _^ 6 

(re — l)(w — 2)(w — 3)*^*^* (re — l)(re — 2)(re — 3)(re — 4)^' ‘ 


(11.41) 



PBODUCT-CUMULANT FORMULAE OF ife-STATISTICS 


271 


Sixth h-SMidic 

><(6*) » - #tia H-r?—(56 + 465K'e^4 4* 4* 461/cJ) 

n n ^ i 


^—— -- (450/C(i#f| + 4" 7200Ke/C4icj 4* 6300/f4#c? 

(n -- l)m — 2 ) 

Kk 

4- 4500/f|K2 4- 21600K’5K4/f3 4“ 4950/cJ) 


n(n + 1) 

(n — l)(w — 2)(n — 3) 


(2400/Ce/c| 4- 21600/c,K3/cf + 15300^J/c| 


4- 64000k:4k:2k2 4- 8100^) 

(»3k„ + 2I«0«« 

n(n 4- l)(w^ + ir)w - 4) . 


I ___7‘^0/c® 

{n — r)(n — 2)(ri — 3)(in — i){n •— 5) ^ 


(11.42) 


Product-Cumuhni Formulae 

k( 32) =i/c. + —»c,K, .. .(11.43) 

n n — 1 

k( 42) = i:«, -KjKj )-^ - kJ.(11.44) 

n n — \ n - \ 

k(.52) =v 1 k, 4 - ffjKj 4—, Kjk, . ....... (11.45) 

» n — 1 « - 1 

k( 62)—-/fs i- K.K-a 4- 4- ..... (11.46) 

n « — 1 71, — I n, — I 

1 14 42 70 

k{12) =~ «’9 f - , *<7f<i H-, H-, • . . . , (11.47) 

H n -- I 71 — 1 7 t ~ 1 

1 If' K.f' 11 O 

k{S2) — K,o f ’ 'fs'^a + ' 'f7«'3 -H —-i«6'f4+- --T'f* • • • (11.48) 

«:(43) I/C, 4 - K^Kt 4- - -a.'‘■ a'fl .... (11.49) 

k^( 53) = i (Cg 4 - (15K,/(g 4 - 45/<-g»c, 4 - 30/ci|) 4 ,w - - (^O/c.fcl + 90«|/c,) . (11.50) 

/c(63) =sr -f (18/C7ACa + OS/Cg/Ca + IO 5 /C 5 /C 4 ) 

n n — 1 

+ + 260k 90«:^).(11.51) 

^(73) == ^ /fjo H-(^l/CgKa + 84 /C 7 AC 3 + IOSk^k^ + 195/c|) 

n n — 1 


+ 


n 

(n — l)(n — 2) 


(126>C(K'| 4 " 630»ca>faK, + 4204cJ»(, 4 - CSO/cg/c®) 


. (11.62) 



272 


APPBOXIMATIONS TO SAMPLING DISTRIBUTIONS 


*k(64) = 

k{64) 

*<r(32*)* 

k(42*) 

ic(62*) = 

k(62») = 

k(3*2) = 

/((432) = 

»c(632) = 


- «» H-—(20#c,k, + 70K«Ka + 120/f»K«) 

n n — i 


+ 

+ 


n 


(« — l)(w — 2 ) 

n{n 4 - 1 ) 

(w — l){w ~ 2)(» — 3) 


(120k(>c| 4- 600if4#f,«, + 1S0 k|) 
240Kaf4 • 


- »fio —— r (24k»«:, 4- 96/f7*c, 4" 194ic*>c4 4- 120>cf) 

n n — 1 


. (11.53) 


'■ -- (180k4k| 4" 1080»c*»cg»cj 4* 720«:4»cj 4* 1260if4»c5) 

[» — 2 ) 

+ . . . . 

16 , 12(2» - 3) , 48 j 

r=T)• • 

20 „ „ , I - 10 ) ^^, 

i — 1) * ’ n{n — 1 )® ' * m(» — 1 )* * 

s , 120 ^ 

,>C 4 >c| 4- ( -. 

24 , 20(3re - 4) , 20(6ra - 7) 

I — 1 ) n{n — 1 )* n(n — 1 )* 

s , 480 , 120 3 

s . . . . 

28 , 12(7« - 9) , 4(41» - 56) 

- :r.X»Xt H-^-— K,Xt H- - -- KfKi 

« — 1 ) n{n — 1 )* n{n — 1 )* 

- 7) 3 , 168 3 , 840 ^ 660 3 

1 ). ”5 + + orni)'-''-''- + (,—i?"’'- 


6 (ite-ll) 8(.'to - 5 ) 

‘ ’ n(n — 1 )* ' ’ n(n — 1 )* 


n(n — 1 )**^* 

3 18(9rt 


i»^yrt — 20 ) 3 ZQn , 

(ir^T)a(n~^^) (»r-iy^« - 2 ) 

+ I 1 0(11^-17 ) 

n{n — 1)* * ^ 7i(n — 1)* ‘ * 

,? + 12( 61^ - 128) ^ ^ 36( 5;> - 12 ) _ 

^ (« — l)*(tt — 2) * ’ ’ (w — 1 )^(m — 2) “ 

.8 

, lOlw — 131 , 6{37w - 66 ) 

' + + n(n - !)■ 

30(9«. - 16) 3 , 30(46« - 92) 

n - l)*(n - 2 )'"*'^* - l)*(w - 2 ‘)''‘'"’'^* 


(11.54) 

(11.65) 

( 11 . 66 ) 

(11.67) 

(11.58) 

(11.69) 

(11.60) 



PRODUCT-CUMULANT FORMULAE OP ifc-STATISTICS 


273 


+ 


60(15 ^-31) ^(45» - 103) „ 

(n - l[*(» - 2) ‘ ’ ■'■{»- l)*(n ~ 2) ‘ ® 


720n 


(n - 1)*(M - 2) 




1620n 
(to -!)*(» - 2)' 


+ T.r — 


k{4*2) 


1 , 32 , 8(13n - 37) , 4(49 to - 73) 

' '^10+ -r, — ..KaKt H- , -,-2 k,k» + - . —j., 

n* n{n — 1) w(w — 1)® »?(w — 1)^ 

, 4(29n - 46) „ , 8(37w - 65) „ , 1536 

+ 1)> + (i+ K"- I)-'"-''- 


+ 


144(7«. - 15) „ , 72(21w - 50) „ , 

J-i-^ Jfj/f' + 


{« - l)8(n - 2) 




(ft - !)*(« - 2) 

192w(w 1) 


^6(1 Oft* -_27n - ^ 
(ft - l)*(rr-“2)(ft - 3) 


144(17ft* - 53w 2 , _ 

(ft - l)*(ft - 2){ft - 3)'^’’'^^ (ft - l)*(ft - 2)(ft - 3) 


ft® . 


/f(43®) S= —^Kio + 


_ 33 

n{7i — 1) 


0(19ft- 25) 3(65ft-107) 

^8^8 —Z -i\o - :r\y^ ^6^4 


7i(n — 1)“ 


71 {rt — 1)^ 


6(19ft - 34) 18(19ft - 33) 72(23ft - 52) 

^(,r-1)^ ''® + (ft - i)^(ft - + (ft _ i)->(ft - 2)'"“''®''* 


64(19ft - 48) 2 , 54(33ft* - 148ft 1- 172) ^ „ 

(i^-:i)i(r7^ 2)'"*'^“ (ft - l)*(w - 2)* “ 


72ft(17ft_-40) l«^«(27ft_-^70) 

^ (« _ l)2(w — 2)* * * ^ (n — — 2W ‘ “ ^ 


216»* 


(ft - 1)^(« - 2)^ 


(ft - l)*(ft 2)*'^-' ■ 


ft(32*) = \ft. 


30 _ 

7i^{n - 1) 


. 2(31n - 53) , _ 23?? + JO) 

,w'^6ft3+- 


ft*(ft ~ ])J 


ft*(ft - ])■' 


240 


«(ft — 1)'*”^’^'* ' ft(« — 1)* 


, , 360(2ft 3) , 24(5ft - 12) , 

-ft4ft,ft4 f ,,, ftj 


??(r? J)^ 


4H0 

(?? - 1 ) 


, KzkI 


ft(42^) 


, ftio + 


36 

ft*(ft - 1) 


, 4(23ft - 37) , 4(47«* - 12()ft 81) 

XkX2 + ,, .,,ft7ft3+ - ft6'C4 


ft^(ft - 1)- 


ft*(« - !)•» 


, 12(9ft* 24ft + 17) „ , 

+ “ -iTn — ft; f 


360 


, 288(5ft - 7) 

«/ i\3 ^ —/ ,,, -ft^ftifts 

ft-(ft — 1)* «{ft — !)■= /((ft 

, 144(7ft - 10) , 24(49« - 95) „ , 9t)0 

+ //(ft -1)^ "J"* + ',,(„_-T)^ "‘"’ + (7,-: i)'"-"^^- ■ 


> , 2160 , 

(ft - !)•* ' ^ 




37 

7?‘^(?? 1) 


“h 


0(17?? - 27) 


, ft-ft3 + 


3(61ft- - I66 h -I 117) 


//‘(ft - 1) 

154// -r 113) „ 6(67» - 131) 

n^(n — 1)* ’ ^ //(ft — !)-(// — 2 


■f 


2(50// 

24(71//* - 246// 1 202) 


//(«-!)'(// 2) 


ft,ft»fti -f 


//*(« - 1)» 
fteftf 

36(20//-’ 103// 4 03) 


, 36(38//.* — 155ft 4- 160) 

w(ft !)*(// - 2) (ft - ])*(// - 2)'''*'^-* 


r?(?i - 1)‘^(?? - 2 ) 
72(14?? 23) ^ 


K\K 2 


A.S. 


4. ^44( 19 n - 44 ) 

^ lYin 2) ' ^ ^ 

“VOl I 


2SH?? 


{71 — 1)^(??- — 2) 


k; 




(11.61) 


«’4ft] 

(11.62) 


(11.63) 

(11.64) 


(11.65) 


( 11 . 66 ) 



274 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


11 . 16 . Additional fonnnlae for the case of a normal parent population have been 
worked out by Wishart (1930). There are two general formulae:— 

.(11.67) 


and the following specific formulae of degree 12 and upwards (those of degree 10 and lower, 
of course, being derivable from equations (11.30) to (11.66) by putting all /c’s higher than 
the second equal to zero). 

34,t56(M - 

k( 3 2 ) - ^--—j^)^(^y_^)«:2.(11.69) 

7776»*(6w - 12) , 

2) = 2)1'^^. 


if(3*2) 


ic(3*2») 

k(3«23) 


108,864»*(5w_- 12) g 
> - l)'(n - 2)» 

12 ) ^ 

'‘Tn - 2)3 


_ 466,560w“(22w* - llln + 142) ^ 


k(3»2) 


k(3*2*) 

/c(4*2*) 


k(4323) 


/c(4*2*) 


KtK{2f^) 


n-1 ' 

„ 2 „/ 30 \ 

192(te(w + 1) , 

(n - l)3(n - 2)(n - 3)'"* 
23,040w(w +1) , 

’ (n"- 1 )‘(h '-2){n "3) 
322,560rt(w + 1) ^ 

‘ (w“’-“l)®(ri, - 2)(« - 3) 


(4*21 = 20,7^«(ft +J)(«® -Jn 4 2) 

^ (n - iy(7t - 2)3(?i - 3)3 ^ 


(11.69) 

(11.70) 


k(4323) 


k(4»2») = 
k(4«) = 


290,304/1(7! 4- l)(/i* — 5/! + 2) 
(^^l)*(^-~2)3(/r^ 3)3 
4,644,864/1 (/!. + 1 )(/!*- 5w +2 
(n — l)3(/i — 2)3(// — 3)3 

-6912/i(«^jM)- 

(n - !)»(// - 2)»(n - 3)» ' 

16 


(11.71) 

(11.72) 

(11.73) 

(11.74) 


(11.75) 


(11.76) 


(11.77) 


(11.78) 


(11 79) 


(11.80) 


(11.81) 


^( 4 * 2 ) =--^k,k(4*) . 


{63w< - 428//® 4 1025 m* - 474m + 180}/c*. (11.82) 


(11.83) 


‘ KJ® approximately . 


. (11.84) 
. ( 11 . 86 ) 








PROOF OF THE VALIDITY OF THE RULES 


276 


In virtue of the result of Example 11.4, expressions of odd degree vaixish, e.g. 
k{ 32'') t=s k(62'') ?= 0 . Further, in virtue of ( 11 , 68 ), K{pV) = 0 if > 2 , for k{p) = k,, == 0 
far the normal distribution. Methods of proof of (11.67) and ( 11 . 68 ) are suggested in 
Exercise 11.9. Exact results for k( 4®) and k( 4®) are given by Hsu and Lawley (1939), 


Proof of the ValidUy of the Rules 

11.17, We now proceed to prove the validity of the rules enunciated and exemplified 
above. Rules 1 and 2 . have already been proved. 

As a preliminary let us define an operator dp such that 

dpfi, = r(r - 1) . . . (f -p + l)//;_p r>p\ 

dpixp=^p\ ^ . . ( 11 . 86 ) 

dpfi^ = 0 r < p J 

and 3p(AR) - {dpA)B + A{dpB) .(11.87) 

BO that 3 acting on a product is distributive. 

In virtue of (11.87) we have 

= 9 -; (Mr)”’ Vr- 

It follows that if / is a polynomial function in the /i’s 




(1L88) 


and this also holds if / can be expanded in a series of pol 3 momial 8 in the //’s. 

Now consider the expression defining the seminvariants in terms of the moments (3.11) . 

iXp ^Kit + . . . + /Cp— + . . = 1 + —h • • • 


exp 

On operating on both sides by there results 


exj) 




it + . . • + ; + 

p\ 






t . 4- + • • • 


) = 


iP + + 






and hence 

+ . . • + + * 
This is an identity in t and hence 


tP 


''pi 


+ .. .) 


For example, 


dpKp 


pi 


A*4 


0 g T^p) 


(11.89) 


SiKi = 4/<3 — 4^3 — Ufi'zp'i — 12/42Pj + 24pi® + 24/42pi — 24/4^* 
= 0 

= 12/4 - 24 / 4 ;^ - 12 / 4 ^ + 24 / 4 ;* 

= 0 

diKt — 24 / 4 ; — 24 / 4 ; 

= 0 
34 /C 4 = 4! 



276 


APPROXIMATIONS TO SAMPLING DISTRipUTIONS 


11.18. Now in accordance with Rnle 1, which we have already established, .. a/*) 

and hence . . . a/s) may be expressed in terms of parent/c’s by an equation of the 

form 

a/* ...) = .. .)} . . .(11.90) 

where A is a factor which it is our object to find. Operate on both sides of (11.90) by 
dfyf' . . .). Every term on the right is annihilated except that in kj/* . . .) 
and we have 




/?.! 




as** 


) . (11,91) 


We now consider an operator 6 ^, analogous to d^, which, when acting on a power of x (of 
any suffix), reduces the exponent by p and multiplies by r(r — 1) . . . (r ~ p +- 1); and 
we will suppose the operator to be distributive.* Regarding Ua®* . . .) as the mean 
value of (&„“' . . .) we see that the result of operating by the 5’s on the mean value is 

the same as that given by taking the mean value of the operation of the 0’s. But this 
latter operation results in a constant, which is equal to its mean value ; and we thus have 

Our rules are concerned with the evaluation of this operation. 




(11.92) 


11.19. Consider now a completed array of type (11.28). A little reflection will show 
that there is one such array for every term in (11.92) which does not vanish by operation, 
and that every term in (11.92) will have its corresponding completed array. The numbers 
in the body of the array are the powers of x occurring in the A’-product; added horizontally 
they compose the orders of the operators ; added vertically they (‘ompose the orders of 
the corresponding i’s. A completed array is, so to speak, a chart of part of the operation ; 
and the whole operation is the sum of all possible completed arrays. 

The operation (11,92) gives us the coefficients in . . .), but we wish to find 

those in the corresponding k:(Ui®‘ necessary allowance is made f)y Rule 3, 

which we now prove ; that is, the coefficient of . . .) in hr(a,®> rig** . . ,) is given by 

all completed arrays, igvorivg those which are re-^olvable into separate blocks each conjined 
to separate rows and columns. 

Referring to equation (11.27), expressing the relation between multivariate moments 
and cumulants, we see that ^ 2 “’ • • *) the sum of terms comjiosed of products of 

one, two, three . . . multivariate momenta, llie first term is /y(ai®‘ a 2 "'» . . .) itself. 
Consider a two-part term such as ' ^ 2 “ ’ • • ^^ 2 “ * • • •)’ where oi\ + — ai, 

etc. Its coefiicient in the expansion on the right-hand side of (11.27) is 

1! 1! al! aV a,! ’ ‘ ‘ 

and hence the coefficient with which it appears in the formula for fc(r7i®‘ f/g®* . . .) is 


_ai! a2^- _ _ /aiVaaN 

a;! a'l'! a^! aj'! ' ' ‘ \a',/\aj/ 


. (ll.O.-?) 


Now Gfa®* . . •) will itself have an array of type (11.28) with column totals (Ut®'^ Ug®'". . . 
and row totals, sa}" {b/^ b/^ • • •); and similarly for/./(('/ 1 ®'^ Ug®"* . . ,), Provided that 


* Op may b© regarded as €Hiuivalent to 


/ dn df' \ 


i.e. to Sp 


in the 


notation 


•f IMO. 



PROOF OF THE VALIDITY OF THE RULES 


277 


4 . s?r / 9 j these arrays will correspond to terms in the k& which, when multiplied, will 
give a term in . . .). Thus the product of these terms may be considered as an 

array of type ( 11 . 2 B) with column totals . . .) and row totals ( 6 /^ fc/* . . .) and 

with the body of the table resolvable into two separate blocks. Since there are ai columns 

of total tti, there will be products of this type in the expression which gives 

a#®* . • ♦)• This factor is the same as (11.93) but of opposite sign. Hence, if we 
ignore the separate two-part blocks in the array for // we shall have allowed for the products 
of two moments which must be subtracted from to give k-. 

Now some of these separate blocks will themselves be separable into two blocks, and 
in subtracting them all from cij®* . . .) we subtract too much. For example, if there 
are three separate blocks, L, M, N, we shall, by considering L and {M -f iV) as two blocks, 
have subtracted L, M, N, We shall have done the same by considering M and (L + N), 
and N and {L + M) as two blocks. That is, we have subtracted 2L, 2Af, 2 N too much. 
We must restore these blocks to the array for // again. Such additions, summed over all 
blocks of three, will be found to equal the terms in the expansion of (11.27) which result 
from the product of throe moments. 

In restoring these blocks we restore too many of the cases where there ax-e four separate 
blocks. These must be subtracted again, and corresi)ond to the negative term in (11.27) 
involving the product of four moments. Proceeding in this way we establish Rule 3. 


11.20. Now we proceed to Rules 4, 5 and 6 , which are the fundamental rules of the 
whole process. Consider again the array of type (11.2S) to fix the idc^xs, say, 


2 3 1 6 

11 . 2 

1.1 2 


4 4 2 


10 . 


. (11.94) 


This array will represent a number of terms in the operation, each of which consists of the 
o])eration of 0 ^ on a term (the first row), 0 ^ on x,x (the second row), and so on. 

Provided that the suffixes of the x’s in any row are alike, every suffix of the will provide 
a term, for contains terms with every distribution of powers (adding to p) and suffixes. 
There will, for instance, be terms of the following kind:— 


0 

a-i 

.rj 


yi 


Xi 


yi 

^1 


3*2 

• 



• 



• 

“■3 

• 

•^1. 

»’2 

• 



. 

0-1. 


In fact, for any completed array, we have terms in which 

(i) all the .r’s have the same suffix {n in number, one for each suffix), 

(ii) all the ;r’s but one row have the same suffix {n{n — 1 ) in number), 

(iii) all the a:’s but two rows have the same suffix and the remaining two arc the same 

(n{n — 1 ) in number), 

and so on. These cases correspond to the various separations dealt wdth in Rule 5. 

Now in case (i) the term in any column arises from the term in in and (apart 
fror^ numerical factors which are considered presently) is from equation (11.16), 
Hence any column which contains an entry contributes a factor and the total function 



278 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


of » arising from case (i) is the product of n and of to the power of the number of 
columns containing a non-zero entry. 

Similarly in cases (ii) and (iii) the n-function for each separation is the product of 
n{n — 1) and, for each column, a factor in n~^ or ——according as the column contains 

non-zero entries in one or in both parts of the separat on; and so on. 

This explains the origin of the pattern function as described in Rule 6, But in order 
to establish that rule completely (and incidentally to establish Rules 4 and 5) we have to 
show that the numerical coefficients arising from each separation are the same. When 
this is done the validity of Rule 6 is demonstrated, for the separate contributions in n may 
be added together to give the pattern function and the whole multiplied by the numerical 
coefficient, 

01 may be considered as the operation of picking out an x from the operand in all 
possible ways and replacing it by unity. Similarly ^ may be regarded as picking out 
V x'a with the same suffix and replacing them by unity. It is thus evident that operating 


on a i product by a 0 product 




of the same degree will yield a result which is 


the number of ways in which sets of x'a can be picked out of the k product so that each 
set contains bi of one suffix, of a second suffix (which may be the same as the first), and 


so on. 

Now consider the operation (11.92) in which the k*a are expressed in the simplified 
form (11.17). The operations 0 being distributive, we shall emerge from the opcx-ation 
with a sum of terms comprising all the possible ways in which the individual x’s can be 
picked out of the k product such that the row and column totals of the two-way array are 
satisfied. Consider the sets corresponding to a particular array, such as (11.94). The 
contribution to the total will consist of the ways of picking out individuals such that 

(i) from the individuals in the first ki are chosen four in the partition (2, 1, 1), 

(ii) from the second k^ are chosen four in the partition (3, 1), 

(iii) from the kt are chosen two in the partition (1, 1), 

(iv) those are associated in all possible ways such that individuals in a row arise from 

the same suffix. 

On consideration it will be seen that the total number of ways of doing this is the number 
of ways of allocating the individuals from column totals as required by Rule 5 ; and this is 
true whether sets of rows have the same suffix or not. 

Rules 5 and 6, and hence Rule 4, follow at once. 


11.21. The remaining rules are ancillary. 

Rule 7 follows from Rule 2. In fact, the pattern function is independent of the numbers 
composing the aiTay, and the pattern with a row containing one element can therefore 
form the skeleton of an array in which that element is unity ; and this would entail the 
appearance of /ci, which by Rule 2 is impossible. 

Rule 8 follows from Rule 11. The column containing the single element appears in just 
one separate of all the separations, and the contributions to the pattern function are thus 
all multiplied by owing to its presence. 

Rule 10 follows from Rule 8. The addition of a unit part is equivalent to the addition 

of an extra column containing unity. This multiplies all pattern functions by leaves 



PROOF OF THE VALIDITY OF THE RULES 279 

numerical coefficients unchanged and increases the suffix of every k according to the row 
in which the unit appears. 


1L22» There only remains to prove Rule 9, Note that any pattern function can be 
evaluated linearly in terms of the functions of the pattern obtained by omitting one of 
the columns. For example, consider the right-hand column of 

XXX 

y ^ ^.(11.95) 

XX.. 

and the contributions to the pattern function from it. The 15 separations which are 
possible with four rows can be divided into two classes, that in which the two rows 
in the fourth column lie in the same separate and that in which they do not. In separations 
of the first type the contributions from the first three columns will be the contributions 
of all separations of 

XXX 

XX.(11.96) 

X X 

in which the first two rows are amalgamated. Considering the function of the first three rows 


X X 
X . X 
X X 
X X 


• (11.97) 


in which amalgamation has not taken place, we see that the contribution consists of all 
contributions wdiich do not occur in the first. Calling the first A and the second B, we see 
that the contribution is 


n n(n 




1 


n(7i — 1) 




i.e. a linear function of the derived patterns A and B. The proof of the general result 
follows exactly the same lines. 

Now if a pattern may be divided into two groups connected only by a single column 
we can reduce it step by step by omitting the other columns. We end up with this single 
column, and the pattern function of this column must vanish ; for the column total 
a corresponds to whose mean value the one-column array expresses, and since by definition 
this mean value is no composite terms such as would be given by two rows or more 
can appear. 


11.23, As an illustration of the way in which the sampling formulae can be used to 
approximate to a sampling distribution, let us consider the distribution of \/^i ^ samples 
from a normal population. We have, in terms of the sample moments, 


Vb, = 


ms 

ma^ 


n — 2 
\/n{n — 1 ) 


For a normal distribution the variance of A^s, ^(3^) is, by (11.36), equal to 



280 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


We therefore consider the statistic 

X 


/(^)(^) _ 

\ On 


“ v(6(„ - m'"’- 

which will, to order n~’, have unit variance. We have 


== / ("■ - — 2) h I — Xal 

\/ On Kj J 


. (11.98) 
. (11.99) 

. ( 11 . 100 ) 


Since the population is symmetrical the mean value of x is zero. We then have, expanding 

( 11 . 100 ), 

6 10 


(n-l)(n-2) 1 
On 


\\ki -1 kiih - K,) + ~ 

•<2 [ Ka xi 


^>1 9ft 

+ kl(k, - K,)* - kl(ka - - K,)« + . . . i. . . ( 11 . 101 ) 

K* 4 IC« J 

The variance may be obtained by taking mean values of both sides, and since k 2 is the 
mean value of ka we have 


^2 




/i(3*) - 1^3*2) + -,M3*2*) - l^/,(3*23) 




IK f)! OU 'I 

Kl J 


^2 


. ( 11 . 102 ) 


We now express the product jli's in terms of product k’s by using equation (11.27) 
and identifying coefficients. For a normal distribution k( 32^) = 0 and we will take our 
approximation to order so that k’s of five parts or more may be neglected. We 
then find, 


var X 


^2L 


(32) - 1 k (3*2) + -'!,{«(3^22) ,,(32),,(22)} 

Kj MTJ 


(n - l)(n - 2) 
ijn 

- 1?{k(3223) 4 - 3«r(3=2)(f(22) + K(3*)/f(2»)}+ {0 /c(3222)k(22) 4 4Kr(3“2)K(2«) 

4 '^2 

+ k(3X2«) 4- 3k(3»)k*( 2»)} - ^-^{ir>K(322)K2(2*) 4 - 10 «( 3 *)k( 23 )«:( 2 *)} 


4- 


?f.l5K(3>='(22)] 

J 


. (11.103) 


Substituting the values of equations (11.31) to (11.85) we find, after some purely algebraic 
reduction, 

120 


1 6 , 28 

v.rx = l 


+ 


n n 


2 ^^3 


In a similar way (for details, see E. S. Pearson, 1930) we find 

, , ^ 1056 , 24,132 

,,.(x) =. 3 --J +—- - .... 

/it(x) is zero, for the distribution is symmetrical. 


. (11.104) 

. (11.106) 



THE MULTIVARIATE CASE 


281 


Thus it appears that as n —► oo the second moment of x tends to unity and the fourth 
moment to 3, which is in conformity with tendency to normality. But the tendency is 
by no means very rapid. When n = 100 the variance is approximately 0-942 and in 
assuming a: to be distributed with unit variance we should commit an error of about 
6 per cent. « 

« 

11«24« There are two ways of improving on the first approximation that x is normally 
distributed with unit variance. In the first place we may consider a transformation to 
a new variate chosen so that f is normally distributed to order Secondly, 

we may fit a Pearson curve to the distribution of x, using the values of moments given 
by (11.104) and (1L105). The appropriate curve is the Type VII 


dFoc 



. (11.106) 


The first line was adopted by Fisher (1928), who obtained the following transformation: 




The second was adopted by E. S. Pearson (1030), who tabulated the 1 per cent, and 5 per 
cent, significance points of (11.106), that is to say the values of the deviates x for various 
values of n such that 99 per cent, and 95 per cent, of the total frequency of the sampling 
distribution falls within a range of ± x on each side of the mean. 


The M'ldfimrlate Case 

11*25. The foregoing results can be generalised to the multivariate case, and we 
give an outline of the extension to that of two variates. 

Given any bipartite number we shall have for any partition {(PiPiT'{P 2 p 2 T* . . .} 
and the bivariate cumulant a A^-statistic whose mean value is Explicitly 




y * * 3^'- vH) 

^ (Pi!)^‘ (jP!’)"** . . . :7rJ jr*! , . , 


(11.108) 


In particular, corresponding to (11.22) we have 


h 


11 (^^'-‘*’11 '^‘^ 10 '^’ 01 ) 


-2//«^I0 6*n ^^20 ’‘^01 4" .9oi) 

1 




{n^{7i -f- - 7l(tl + 1)-S30 '^Ol - ^ 71(71 1)^11 ^20 

— ^71(71 + 1)^21 ^10 + ^10^01 «?o| 

7 -rw—-ow—-q\j~ S 02 

(ra —1 )(m—2)(»—3)[ ' n n n 


2(w - 1) 


2 - 


20 *01 ”9 ®10 ^01 

n 71^ 


(11.109) 



282 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


In geiieralisation of the mean value functions of the we may write, for example, * 

E(hi^ kxi) == 

-15(^20 ^oa) = j 2^ 

with corresponding k^s. The latter may be expressed in terms of the cumulants of tte 
bivariate distribution as in the univariate ; and the coefficients will now depend on partitions 
of bipartite numbers. Our rules still apply (and in particular the pattern functions 
appropriate to particular arrays are the same); but the numerical coefficients associated 
with completed arrays are modified, for we now have to consider the number of ways of 
allocating two different sorts of individuals in a two-way partition of a bipartite number. 
An example will make the modification clear. 

( 2 2 1 \ 

2 ^ j j. The total degree is 

10 and, the orders of the product being 6, 2, 2, we have to consider arrays of type 

6 
2 
2 

4 4 2 10 

i.e. those we discussed above. The pattern functions are those we have already found. 
For the numerical coefficients we have to regard the column totals as consisting of the two 
types of object in number (2, 2), (2, 2), and (1, 1) and the row totals (3 ,3), (1, 1) and 1, 1), 
For instance, the array 


might be written either as 


or as 


2 2 2 

6 


1 1 . 

2 


1 1 . 

2 


4 4 2 

10 


(1,1) (1,1) (1,1) 

(3, 3) 

(0,1) (1,0) 

• 

(1. 1) 

(1,0) (0,1) 

; 

(M) 

(2,2) (2,2) (1.1) 

I ('>, 5) 

(2,0) (0,2) (1,1) 

(3. 3) 

(0,1) (1,0) 

. 

(1.1) 

(0,1) (1,0) 

• 

(1,1) 

(2, 2) (2, 2) (1, 1) 

(5, 5) 


( 11 . 110 ) 


( 11 . 111 ) 


each of which will make a contribution to the numerical coefficient. It will be found that 
no other arrays are possible except those obtained by permuting the first two columns. 
The numerical coefficient in (11.110) and the permuted array together is 

( 2 ! \( 2 ! \( 2 ! \/ 2 ! \ 1 



THE MULTIVARIATE CASE 


283 


That in (11.111) and the permuted array is 

V2!/Viri!Al* 1!A2’/’2! 


4. 


The total contribution is thus 20. The pattern function is 


(» - 1)(M - 2)’ 

In the same way it will be found that for the partitions 


2 

1 

1 


4 4 2 


2 

2 

10 


4 4 2 


6 

2 

2 

10 


the coefficients are 48 and 8. Thus the desired coefficient of is 

20 . 48 , 8 _ 4(19n - 33) 


+ 


+ 


(w - 1) («. - 2) (n - 1)* ' (» - 1)2 {n - l)2(w - 2) 


Example 11.6 

To find an exact expression for the covariance of the esti m ates of variance of two 
correlated fariables, i.e. 

it S)- 

This will clearly consist of tliree terms, in For the first we have 

the partition 


(2,0) (0,2) 

(2, 2) 

(2,0) (0,2) 

(2, 2) 


with pattern function - and numerical coefficient unity. For the second no contribution 
exists, the only arrangement being 


(2,0) . 

• (0,2) 

(2, 0) 
(0. 2) 

(2,0) (0,2) 

(2, 2) 

function. For the th 

(1,0) (0,1) ! 

(1. 1) 

(1, 0) (0, 1) 

(1,1) 

(2, 0) (0, 2) 

! (2, 2) 


the pattern function for which is 


(n 1) 

Ko 2 ) =^'^‘2 + 


and numerical coefScient 2. Hence 
2 „ 


^ 11 * 


11,26* In conclusion it may be noted that the method of expectations may be used 
to derive sampling moments of the distribution of samples from a finite population. The 
algebra becomes much more complex because the sample values are no longer independent 



284 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


and we cannot, for example, write E{Ex/Xf*) =n{n — Tschuprow (1923) and 

j¥‘k 

IsBerlis (1931) have investigated the subject systematically, the latter giving formulae for 
the first four moments of the mean and the first four of the second moment. Some of 
these had been obtained earlier by Tschuprow himself, Neyman (1925) and Church (1926). 
We quote the following formulae, in which N represents the number in the population: 


Moments of the Mean : 

. . . . 

- "i'' = ».(A- - 2)(iV - 3)«"‘ 


Moments of the Sample Variance 
E{m^ 

{n — \)N 


6AVi N + 

+ 3A^(A^ - 71 -- l){n . 


. (71 ^ 1)A' 

E(ni,) - /i. 




( 11 . 112 ) 

(11.113) 

(11.114) 

(11.115) 

(11.116) 


- (N^n - 3N^ + 6A^ -- --- 3)//^ } . (11.117) 

Church also gives the third and fourth moments of the sample variance, the formulae taking 
several pages to write down. His version of the fourth moment contains errors wliich 
were corrected by Isserlis. The statistical usefulness of these results scemR to he somewhat 
limited, but it would be interesting to inquire how far the combinatorial method appropriate 
to i-statistics can be extended to the case of the linite population.^ 


NOTES AND REFEREN(T.S 

For earlier work on the expectations of moments see Tschuprow (1919) and Church 
(1925). Thiele (1903) seems to have been the first to appreciate the possibilities of using 
other symmetric functions, but owing to the fact that he defined his sample ‘‘ semi¬ 
invariants ” to be the same function of the observations as the parent seminvariants 
(our cumulants) are of the parent values, his formulae remained complicated. Later in¬ 
vestigations on similar lines were carried out by Craig (1929) and 8t. Georgescu (1932). 

i-statistics were introduced by Fisher in 1928 and subsequently applied to several 
problems by himself (1930) and Wishart (1928). The theory of the sampling of the statistics 
has been developed by Fisher and Wishart (1930) and applications to the normal distribu¬ 
tion given by Wishart (19296 and 1930), E. S. Pearson (1930), and Hsu and Lawlcy (1939). 
The general theor}^^ of the ifc-statistics and other seminvariant statistics has been discussed 
by Kendall (1940a, 19406, 1940c, 1942) and Dressel (1940). DresscI has called attention 
to the relationship between seminvariant statistics and the seminvariants of the theory 
of binary quantics. The reader who refers to Fisher’s basic paper of 1928 should beware 
of misprints. Methods of deriving bivariate formulae from univariate formulae by symbolic 
processes are given by Kendall (1940c). 

* See Irwin and Kendall, -4nn. Eng. Lond., 12, 138, for a derivation of thewj formulae from thoso 
for an infinite population. 



NOTES AND REFERENCES 


285 


Befei'ence is made in the text to the work by Tschuprow (1928) andlsserlis (1931) on 
finite populations. The latter's method appears to be the simplest known at the present 
time. 

Church, A. E* R. (1925), On the moments of the distributions of square^d standard devia¬ 
tions for samples of n drawn from an indefinitely large population,” Biometrika, 
17, 79. 

-(1926), On the means and squared standard deviations of small samples from any 

population,” Biometrika, 18, 321. 

Craig, C. C. (1929), ** An application of Thiele’s seminvariants to the sampling problem,” 
Metron, 7, 3. 

Dressel, P. L. (1940), “ Statistical seminvariants and their estimates with particular emphasis 
on their relation to algebraic invariants,” Ann. Math. Statist., 11, 33, 

Fisher, R. A. (1928), ‘‘Moments and product-moments of sampling distributions,” Proc. 
Loud. Math. Soc., 30, 199-238. 

-- (1030), “ The moments of the distribution for normal samples of measures of departure 

from normality,” Proc. Roy. Soc. A, 130, 16. 

-and J. Wishart (1931), “ The derivation of the patt;ern formulae of two-way partitions 

from those of simpler patterns,” Proc. Loiid. Math. Soc., 33, 195-208. 

Hsu, C, T., and Lawley, D. N. (1939), “ The derivation of the fifth and sixth moments of 
62 in samples from a normal population,” Bkmietrika, 31, 238. 

Isserlis, L. (1931), “ On the moment distributions of moments in the case of samples drawn 
from a limit/ed universe,” Proc. Roy. Soc., 132 A, 586, 

Kendall, M. G. (1940a), “Some properties of A^-statistics,” Ann. Eugen. Land., 10, 106. 

- (19406), “ Proof of Fisher’s rules for ascertaining the sampling semi-invariants of 

/•-statistics,” Aim. Eugen. T^ond., 10, 215. 

-(1940r), “ The derivation of multivariate sampling formulae from univariate formulae 

by symbolic operation,” Eugen. Land., 10, 392. 

- (1942), “On seminvariant statistics,” Ann. Exigen. Lmid., 11, 300. 

Neynian, J. (1925), “Contributions to the theory of small samples drawn from a finite 
population,” Bionieirika, 17, 472. 

l\'arson, E. S. (1930), “ A further development of tests for normality,” Biometnka, 22, 239. 
St. George^^cu, N. (1932), “Further contributions to the sampling problem,” Biometnka, 

24, 65. 

Thiele, T. N. (1903), Theory of Observations, London, C. and E. Layton (reprint/ed in Ann. 
Math. Statist., 2, 165). 

Tschuprow% A. A. (1919), “ On the mathematical expectation of the moments of frequency 
distributions,” Biomeirika, 12, 140 and 185, and (1921) 13, 283. 

- (1923), “ On the mathematical expectation of moments of frequency distributions,” 

Metron, 2, 461 and 646. 

Wishart, J. ( 1928 ), “ A problem in combinatorial analysis giving the distribution of certain 
moment statistics,” Proc. Land. Math. Soc., 29, 309. 

-(19296), “ The correlation between product moments of any order in samples from 

a normal population,” Proc. Roy. Soc. Edin., 49, 78. 

-(1930), ” The derivation of certain high-order sampling product-moments from a normal 

population,” Biomeirika, 22, 224. 

-(1933), A comparison of the semi-invariants of the distributions of moment and 

semi-invariant estimates in samples from an infinite population,” Biomeirika, 

25, 52. 



286 


APPROXIMATIONS TO SAMPLING PISTRIBUTIONS 


are 


EXERaSES 

11.1. Show that the pattern functions of the following patterns:—* 

XXX X X X X 

X . X ..XX 

XX. XX.. 

1 , 1 


and 


respectively. 


(n - 1)2 n(n - I)* 

11.2. Show that the pattern function of the pattern 


(Fisher, 1928.) 


with p-columns is 

11.3. Verify the formulae of equations (11.33) and (11.39). 


X 

X X 

X 

X X 

1 



t {n 


(Fisher, 1928.) 


11.4. Show that the generating function of the moments of the i-statistics, 

Till 


is given by 




l^exp {tiKx + UKt + ttKt -I- . . .} exp . .|J 

where K,. is the same function of the operators ^ as is of the observations x and s,, == Z{x^.) 

Deduce that 

^pi^p) ^ jP* 

• • •) = 0 

where {piPt . . .) i® partition of p. (Fisher, 1928.) 

Note that if M{z) is the moment-generating function of x, the mean value of 

f -/(») "ill be »nd that of P by 

Hence that the generating function of the moments of f may be written 


1 11.5. Show that 


and hence that 


^tkpixi +h, Xt, . . . a;„)} ^Kp-^-hP 


Spkp^pl 

Sjcp = 0 q 9^ p 


where 8p is the same function of the operators ^ as 5^ is of the observations x. 


(Kendall, 1940a.) 



EXERCISES 


287 


11.6. Show that the generating function of the aaoments of the Jb-statistios is given by 




+ . 


+ lCit% + • 



and hence derive the result of the previous exercise. 


(Kendall, 1942.) 


11.7. Use Exercise 11.5 to show that, in the expression of in terms of the sym¬ 
metric sums 8, the sum of the coeflBicients is i. 

n 

Show similarly that if 


then 


8p 1 4 “ • . • + ^Pip, * * * 

A A A 

1 =: ZZB -f ^ ^ ^ ^ -1, ^Pt • * * -1- ^ ^ ^ 

n n'-* ' * * 


(Kendall, 1940a.) 


11.8. Referring to the result of 11.22, show that for a normal parent population the 


1 


. a/') is to give pattern functions -times 


effect of adding a new part 2 to . 

those of the original. Show also that the effect on the numerical coefficient in an array 
is to multiply by twice the number of rows in the array. Deduce that the effect of adding 
a new part 2 is equivalent to operating by 


2k | 

n — i dK^ 

(Fisher and Wishart, 1930.) 


11.9. Use the previous exercise to establish equations (11.67) and (11.68). 

11.10. In generalisation of Exercise 11.8 show that for a multivariate normal parent 
the effect of adding a covariance (p, q referring to the pth and qth variates) is equivalent 
to operating by 

where is the covariance of the variates p and q. 

(Fisher and Wishart, 1930.) 


11.11. If a pattern contains a column with throe entries ; if the patterns obtained 
by suppressing this column and (1) amalgamating the three rows, (2) amalgamating the 
pairs of three rows, and (3) leaving the rows imamalgamated are A, Bi, R,, B^ and C respec¬ 
tively, show that the pattern function for the original pattern is 


n 


A - 


1 


(n — l)(n — 2) (n — l)(w — 2) 


(R, 4-R, + R,) + 


n(n — 1)(» — 2) 


0 . 



28S 


APPROXBIATIONS TO SAMPLING DISTRIBUTIONS 


Deduce that the function of the pattern 

XXX 
XXX 
y X X 
X X 


IS 


+ J7n + 2 

(n ~ - 3)* 


(Fisher and Wishart, 1930.) 


11.12. Show that 


and 


(! 

■e 9 




Kti + - 


2 


.2 


and hence that if p — - 77 -- , and r ~ 

"V (^20^02) V (”*20^"o2) 

then to order 7r^ for normal samples 

vai r -=r ^(1 —• 
n 

11.13. Show that for a bivariate normal population and k^ have zero covariance 
unless t u — V -i- w, 

(Wishart, 19296 ) 


11.14. Show that for a bivariate normal population 

1 




2pT^T, ^ .rf 
0 ^ 10-2 


var 




1 )! 






where is thejth difference of the Ath power of zero and F refers to the hypergeometiic 
function. 

(Wishart, 19296.) 

11.15. Use the methods of this chapter to verify that to order 
var [m^) = -(/^g - //f h 

Ifh 


11.16. Referring? to Exercise 11.4, show that the moment-generating function 

A*. A/ 4 ICf 

’ ' ' kl\' ' ' 


M'ixt, T„ . . of the statLstics kt, 


A. ’ A.-r 

is given symbolically Jjy 

Jf'(T„ T, . . .) = exp|Tji<:, - 






^3 • • •)> 



EXERCISES 


289 


where Jf (^i, <«...) moment-generating function of the it-statistios and is the same 

3 

function of ^ as is of x. 

Noting that in normal samples the distribution of is independent of that of the other 

( 2k t 

1 - - j show that 


J? + 


M'{xx, T, . . .) = exp^i 

Hence, if the number r refers to that 

. . 5'-463“) = /x(. . . 6«4ft3“2-0f-f 

dP\ 

where 2j = a + 6 + + . 

and hence that /x(. . . 6«4*3“) = /x(. . . 6”4«’3“2 - 

Deduce that 


X'-r^^y 


i(n~l) 




2Kd 


n 




1)(M + 1) . 


(n - 1 ) 


+ 2j -—3) ^ 


var _ 

W/ {>•' — 2)(« 4- !){» + 3) 

/fca\ _ 108»*(w - l)*{n» 4 27« - 70) 

'*^‘V)fcs 7 (n — 2)'»(» -f 1 )(m -4 3)(» 4 5)(m 4 7)1^4 9)’ 

Hence verify the formulae of equations (1L104) and (1L105). (This remarkable resuH 
IS due to Fisher (1930). The independence of ki and the other statistics may be seen by 
considering the 72-fold sample space, k^ appearing as the square of a length and the others 
as angles (cf. Geary (1933), Bto7netrika, 25» 184) ) 


y 


lin- 1 )(^/ 
V 24/2 ( 


11.17. Defining y by the relation 

2 )(^ - 3) k_, 

24/2(72 + 1) 

show that the moments of the distribution of y m samples from a normal population are 

//i - 0 

.532 


2^2 


1 - 

n 


jM. = 3 4 


, 65 , 4811 

1 -- 1 

2n bn 


1 _ 13 6,60 5 \ 

“ ' 16to» ■ ■ 7 


4 1 - 

n u* n* 


(E. S. Pearson, 1930, by the method of 11.23, before the exact results of the previous 
exercise had been given. He fitted a Pearson Type TV to the distribution by using these 
moments, and tabulated the 1 per cent, and 5 per cent, significance points.) 


A.S.—^VOL. 1. 


tr 



CHAPTER 12 


THE x^-DISTRIBUTION 

12.1. Among the sampling distributions of current statistical theory the normal 
distribution is perhaps of widest application in virtue of the tendency of many statistics 
to normality in large samples, irrespective of nature of the parent population. There 
is one other distribution, closely related to the normal, which has a somewhat similar general 
applicability and we give an account of it in this chapter. 

Suppose we have a number of compartments or cells determined by specified ranges 
of a variatc-value or by some qualitative character, such as the intervals of a univariate 
frequency-distribution, the cells of a bivariate distribution, or a simple classification of 
individuals into two classes, A and not-A. Suppose these cells are filled by random 
sampling from a parent population and that in the parent the proportion of members in 
the jth cell is In a sample of n there tvill occur proportions of, say, in the jth cell, 
the observed numbers being accordingly npj. If the sampling were such as to give an 
exact representation of the parent these numbers would be Our fundamental problem 
is to determine how far the can, to any acceptable degree of probability, diverge from 
the by random sampling fluctuations. We shall then be able to test the accuracy 
of the hypothesis on which the tt’s were determined. 

A few examples from material occurring in earlier cha])ter8 will illustrate the problem. 
In Table 5.1 on page 117 were given the actual occurrences of throws of dice, 7/ being 
26,306, the cells being eleven in number according to the number of successes,’’ and a 
third column showing the theoretical frequencies based on the hypothesis that the sam¬ 
pling obeyed a binomial law. The observed frequencies are our yrp's and the theoretical 
frequencies our The question is, are the differences between tlie two such as can 

have arisen by sampling fluctuations alone ? If not, then we must reject our hypothesis 
as to the generation of these dice-throws according to the binomial law. 

Again, in the table of Example 8.6 were shown some results of inoculation against 
cholera. The question which interests us here is whether inoculation does in fact restrict 
or prevent attack. If it does not, we expect to find the same proportion of attacked in the 
inoculated class as in the not-inoculatod class, e.g. the proportion of attacked in the former 
would be X 279 = 23-5 approximately as against an observed 3. The former number 
is an nn and the latter an np. Once again we should examine the differences, and if they 
were large enough to be inexplicable on the basis of sampling alone, should reject the h 3 i) 0 - 
thosis of independence of inoculation and attack, concluding that inoculation was to some 
extent preventive. 

12.2. Consider then samples of n with a division of the possible classification into 
p cells ; and suppose the members distributed simply at random in these cells. 

Then the probability of there being members in the first cell, Z® in the second, and 
80 forth, is the term in Ttjf* ... in the multinomial form 

(TTi + Tta + . . . TTp)^, 



THE ^‘-distribution 291 

If the Ts are not small we have, in virtue of Stirling’s approximation to the factorial, 


T JTi . . . ^1’“ iji+i . . . lJf+*e~‘f^/{2n) 

and since n = El this becomes 

= nstj 


( 12 . 2 ) 


(12.3) 




I, —• UTTj _ 1/ — A/ 






Now put 
and 

Then from (12.3) we have 

log T — log (constant) ~ 2rJ(/^ + |) log 

i I h. 


Aj -r V Aj^j 


= — 2r(A -f- J +- s v-^) log 


If k is large, f will be small compared with A and to first order we have, expanding the 
logarithm, 

log T - log (constant) -- — E{1 1- \ + ~ ^ A ) 

- - + O(A-i)}.(12.4) 

-- A^) -- t/ ^0 


Now 

and hence, to order A~^ 


log T —■ log constant -= — 
T cc exp(— 


. (12 5) 


Hence the frequency T varies as that of the sum of squares of p normal variates of unit 
variance which arc subject to the constraint E{^^/X) = 0 but otherwise indefiendent. 
Now put 

\2 

• ( 12 . 6 ) 


^2 z=j: _ -i-- 


Then the frequency of is that of the sum of squares of {p — 1) independent normal variates 
of unit variance. Its distribution is then, from Example 10.4, given by 


dF 




- (12.7) 


2i(p-3)r| 


M 


e-ixy-2 dx. 


. ( 12 . 8 ) 



292 


THE ;f*-DISTRIBUTION 


Furthermore, if there are certain constraints on the cell frequencies expressible by 
K linear equations among them, the distribution remains of the same form but is now 


dF 


1 


2 K>- 2 )r^|^ 


dx, 


(12.9) 


.where r, known as the number of degrees of freedom, is p — k, that is the number of cells 
whose frequencies can be assigned without restriction. (Cf. Example 10.4.) 


12.3. The distribution (12.9) is usually known as the ;^2-di8tribution, though it is 
actually that of %, However, ;f®and not % is the quantity which always occurs in practical 
calculations and most tables of the distribution function have argument. (12.9) 

is only an approximation, and relies on the fact that for large A the Stirling approximation 
to the factorial will hold and that deviations from theoretical A’s are negligible to order n*"*. 
In point of feict, the approximation is very good and the ;^2.(jigtribution may confidently 
be applied when the theoretical cell frequencies are, say, not less than 20. 

Before dealing with the applications of the above results we will consider in more 
detail the properties of the distribution. 


Properties of the x^'Distribution 

12.4. Writing x^ -- C have for the distribution 

1 


dF -r 


2 iT 




0 < C < cx) 


a Pearson Tyf)e III distribution. The characteristic function is 

m = — -- V. • ■ 

(1 - 2ity^ 

whence, for the cumulants, wo hav^e 

K,. = V 2’-\r - 1)! ■ 

and for moments about the mean 

/<, = 2v \ 

jUj = 8 v I 

^ 48v -I- 12».» 

== 32v (5v + 12) 

//, = 4(>t'(3v* + 52r 96) 




Since is linear in v, which can contain only 

T T — 1 

in V, i.e. - if r is even and —— if r is odd. 


( 12 . 10 ) 


( 12 . 11 ) 


( 12 . 12 ) 


(12.13) 


powers of //„ must be of degree 


As V tends to infinity the ^^-distribution tends to normality, for in standard measure 
we have 


vit / 

tt>{i) = e vw {1 — 


V(2r)j ' 





293 


I 


PROPEKTIES OP THE ;f»-DISTRIBUTION 


log 


— vit 

V(^) 


vf- 2it _ 1/ 2it Y 
2\V(2»') 2\v{2v)J • • ; 


The tendency is, however, rather slow, and there are better approximations as we shall see 
in a moment. 


12.5. The frequency curves given by (12.9) extend from zero to infinity. In the 
case V == 1 the curve is merely the positive half of the normal curve. In other cases it is 
zero at the origin, rises to a mode and then falls off again to infinity. The maximum ordinate 
of the ;^-di 8 tribution (not the ;f*-di 8 tribution) is given by 

= «, 

namely, by 

-1, .(12.14) 

and that of the ;^ 2 -di 8 tribution or C-distribution by 




- 1 ) .. 0 , 

namely, by 

C ^ 2.(12.15) 

The skewness of the C-dislribution in the form (mean—moc]e)/( 8 iandard deviation) is then 

V — {v — 2) _ 12 

\/2v \J v' 


12.6. The distribution function of ( 12 . 10 ) is an incomplete /^-function. We have 

'2-2rr 


J 0 <>., pi ’’ 


or, in the notation of Pearson’s tables, 

W(^ 


4 - 1) 


. I 




(12.16) 


Rome special tables have, however, been eonstnicted. 

(a) Elderton's table {Tables for tHatisticians and Bionicfricians. Part I) gives the 

values of P = T dP - 1 - P(C) - 1 - F(x^} for values of r = 2 ( 1 ) 29 and ^ 1 (1) 30 ; 

J x‘ 

30 ( 10 ) 70. In this table, which is to six places, our v is denoted by n' — 1 . 

(h) A table by Yule, reprodueed at the end of this volume, supplements Elderton’s 
by giving P for r = 1 , x~ ~ 0 (0 01)1 ; 1 ( 01 ) 10 . 

(c) Kelley (1938) gives a four-place table of P for from 0 ( 0 - 1 ) 4-1 and v~ 1 ( 1 ) 10 ; 
12, 15, 19, 30. 

(d) Fisher and Yates (1938) give tables in an inverted form, showing the values of 



294 


THE x^-DISTEIBUTION 

for certain values of P and v, namely P == 0-99, 0*98, 0*95, 0*90 (0*10) 0*10, 0*05, 0*02, 
0*01 and 0*001; and v -== 1 (1) 30. 

((?) Thompson (Biowemka, 32 (1941), 187), gives tables in the inverted form for 
P = 0*995, 0*990, 0*975, 0*950, 0*750, 0*600, 0*250, 0*100, 0*050, 0*025, 0*010, 0*005 and 
V « 1 (1) 30 (10) 100. 

For general use the incomplete jT-function tables are probably the best, as interpolation 
in Elderton’s table does not give very great accuracy. The significance points tabled by 
Fisher and Yates are, however, sufficient for many practical applications of the ;^®-dis- 
tribuiion in carr^dng out statistical tests. We reproduce at the end of the volume a diagram 
which will serve for such purposes. It shows, for co-ordinates v and the curves 
P ^ constant, so that for given v and it is easy to determine whether P falls between 
any of the values for which the curves are drawn. 


12,7. Except for Tliompson’s table, the tables do not cover the region for which 
V > 30, and for such values an approximation to the normal distribution may be em¬ 
ployed. There are two such in common use:— 

(a) (Fisher) that is normally distributed about mean ‘y/(2v — 1) with unit 

variance ; 

/ y i 2 

(b) (Wilson and Hilfertv (1931) ) that {'^ ) is normally distributed about moan 1 —- ~ 

" \v / ‘ 9v 

2 

with variance The second is more accurate, but involves more arithmetical work in 
9r 

applications. 


The relative speed of approach to normality of and V{2x'^) may be compared as 
follows :— 

i"or have, from (12.13), 

^8 


>■* “ - (S' ■■ y: 


72 — 1^2 3 


12 


For the moments x have 


—— f“ 

2io - ^0 


e zx 


1 fr-1 




Thus 




/<; = vs 


Using the expansion 


log r{x + 1) I log {2 .t) + (x 4- i) log X - X + 


(12.17) 

(12.1H) 


(12.19) 




295 


PEOPERTIES or THE ;^*-X>ISTRIBUTION 
an extended* form of Stirling’s formula, we find after substitution and reduction 

fi, = ^ + air* + + - • •)’ 

fJLo — V 

IH ={V + l)(iy 
= (»’ + 2)v 

whence we find for moments about the mean 


whence 

Also 


Hence for the constants of 


= 

jMs 

// 4 — 


Hv 


-h . 


1 


4 \ V 
:i _ 3 
4 8i- 

\/{2x'^) wo have 


+ 0 (r~ 2 ) ^ ^ 


/fg = 1 


1 

4v 


Yi 


v/2i 


+ . 


72 - 0(r“2) . . , 


( 12 . 20 ) 

( 12 . 21 ) 


A comparison with (12.17) and (12.18) shows that \/'(2x^) tends to normality with con¬ 
siderably greater rapiditj^ than Moreover, tho expression for of x is oqnal to \/'{v — J) 
to order v ' and hence V(2x^) is distributed about mean \^(2v — 1) to that order, with 
variance which is unit y to order v 


12 .8. For the Wdson-Hilfcrty a})proxiniation, consider the distribution of about 


its mean value r. Let us tiiid the distribution of 
mined. Write f — r. Tlau 


( 7 )" = s^y> 


h as yet being undeter- 


y 


(■' -T 


, 'If: 

1 ' 2 


Taking mean values and \ising the results of (12.13), we find, after some reduction, 

My) 


1 + 4 elc. 


2 t 


h(h -1)1, - l)(/^ - 2)(3A 1) 


h^h - m h - ax'- - a) Q,,.-., , 

e,.s 


. ( 12 . 22 ) 



296 THE ^’-distribution 

K in this we put rh we find the mean value of y' and thus 

1 + ri).i + etc. {12M) 

r-rv>, j y 

We now choose h so that the third term in (12.22) vanishes, i.e. we take A = We 
then find 

O <2A 

i-M - ‘ - 97 + sv + 


"■<»> - * - a; + Si + 
- 1 


M = 1 + 9 -^ - 3V* + 3V3 + 
or, transferring to the mean, 

s. - 

00 

/‘•M “ 3^,^ - 3 !*. + 


. (12.24) 


We now find 


2 « 



ya = - 


Ov’ 


. ( 12 . 2 f)) 


. ( 12 . 26 ) 


Comparison with (12.17), (12.18), (12.20) and (12.21) shows that tends to symmetry, 

as measured by y,, more rapidly than either or \/(2x^). To order v~ * the variance is, 
-22 

from (12,24), equal to and the mean 1 — The result may also be expressed by 


saying that 



. (12.27) 


is distributed normally about zero mean with unit variance. 

The following table, quoted from Garwood (1936), shows some comparisons of the two 
approximations with the actual values. In each case there have been worked out the values 

of X* corresponding to P 1 dF of 0*01, 0-06, 0-95 and 0-99. The exact values are 

Jjt' 

denoted by m^, those given by Fisher’s approximation nip and those by Wilson and Hilferty’s 
approximation by nipfr. 



297 


EXAMPLES OF THE USE OP THE ;^*.DISTRIBUTION 


P « 0*99 


P »* 0-95 


P ^ 001 


P 0 06 


The niiy approximation is evidently very good and the ntp approximation is fair. 

12.9. A third method of approximation may be obtained by the method of 
6.32 and 6.33, and in fact our Example 6,5 was virtually based on the y^-distribu- 

/2 12 48V2 

tion. From equation (6.75) with ==: /a = 0, Z3 , 

Z^ we find from (6.73) or (6.75) a normal deviate whose distribution function apj)roxi- 

mates to that of 


TABLE 12.1 


Comparison of Approximations to the x^ Integral. 
(From Garwood, Biometrika^ 28, 437.) 


— 

r 

mp 

nip 

fUf ~~ mp 

ntfp 

mp — Wif 

40 

11082 

10-764 

0*318 

11-070 

0-012 

60 

18-742 

18*414 

0-328 

18-732 

0-010 

80 

26-770 

26-436 

0 334 

26-761 

0-009 

100 

35-032 

34-694 

0-338 

36-025 

0-007 

40 

13 255 

13*116 

0-139 

13-264 

0-001 

60 

21-594 

21 466 

0 139 

21-594 

0-000 

80 

30-196 

30*056 

0-140 

30-196 

0-000 

100 

38-965 

38 825 

0-140 

38 965 

0-000 

42 

33-103 

32-700 

0-403 

33113 

~ 0*010 

62 

45-401 

45 003 

0 398 1 

45 409 j 

-- 0*008 

82 

57*347 

56*953 

0 394 

57 355 , 

- 0-008 

102 

69*067 

68-676 

0-391 

69 074 i 

- 0*007 

42 

29*062 

28*919 

0-143 

29*060 

0*002 

62 

1 40 691 

40*548 

0 143 

40*689 

0002 

82 

52 069 

51-926 

0-143 

52*068 

0-001 

102 

63*287 

63-144 

0 143 

63*286 

0-001 


Examples of the Use of the x^-'l^i^l^'i^utian 

12.10. We now proceed to consider some examples of the use of the ;^*-distribution 
in comparing observation and hypothesis. 

Example 12.1' 

In Table 12.2 we repeat some of the material of Table 5.4, giving the distribution of 
first hands at whist according to the number of trumps held. The observed frequencies 
are our npj = ly The theoretical figures L are based on the hypothesis that the distribution 

follows the hypergeometric series. The last column shows the contributions ^^— to 

the total X*’ To avoid small frequencies we have grouped together those frequencies 
of 7 or over. 



298 


THE ;^*-DISTRIBUTION 
TABLE 12.2 

Distribution of First Hands cd, Whist according to Number of Cards held of a given Suit. 

Hands Dealt but not Played, 


Number of Cards, 

Observed Frequency. 

1 

Theoretical Frequency. 

A 

ir 

0. 

35 

43 5 

1 601 

1. 

290 

272 2 

1 164 

2. 

696 

700 0 

0 023 

3. 

937 

973 5 

1 369 

4 .... . 

851 

811 3 

1 943 

5. 

444 

424 0 

0 943 

6 . 

115 

141 3 

4 895 

7 and over 

32 

1 

1 

34 2 

0 142 

Total .... 

3400 

1 

3400 0 

12 140 


The total is seen to be 12 140. The num})er of degrees of freedom v is one fewer 
than the number of cells (excluding the total), namely 7. From the diagram in the appendix 
it is seen that the probability of getting a value as great as this or greater on random 

sampling j* dF ^ lies between 0*1 and 0*05, very close to the former. The odds 

are therefore about 9 to 1 against getting the observed frequencies or fiequencies whicdi 
diverge to a greater extent from theoretical frequencies. This is hardly gieat enough for 
us to be able to say that an improbable event has occurred, and therefore we need not 
reject the hypothesis that the observed fn^^uencies did in fact arise according to the 
hypergeometric law and that the discrepancies are merely samj^ling cffecth. 

The matter stands differently with the distiibution of Table 1 2.3 showing the distribution 
of 25,000 deals of whist classified in the same way as that ol Table 12 . 2 . Here — 174 130 

TABLE 12 3 

Distribution of First Hands at Whist according to Number of Cards held of a gian Suit. 

Hands actually Played. 


Number of C'ards. 


Thoorf^tjcal Fiequoncy, 

A 

(1 - A)2 

/ 

0. 

215 

320 

34 453 

1. 

1,724 

2,002 

38 003 

2 . . . . 

5,262 

5.147 

2 509 

3 . . . . 

7,440 

7.158 

n 110 

4 . . . 

0.371 

5.965 

27 634 

5. 

2,950 

3,117 

8 947 

0. 

852 

1,039 

33 056 

7 . . . . 

i 166 

220 

13 255 

8 and over 

i 20 

31 

3 903 

Total .... 

25,000 

24,999 

174 130 












299 


EXAMPLES OF THE USE OP THE ;^«.DISTRIBUTION 

and V ^ 8 (one more than in the previous example because we have grouped only those 
frequencies of 8 and over). From the diagram it is clear that the chance of getting such 
a value or a greater one is exceedingly small, certainly less than 1 in 10,000» This very 
rare event leads us to reject the hypothesis that the hypergeometric distribution is operating. 
The explanation is probably that these deals were taken from actual play, whereas those 
of the previous table were obtained without actually playing the hands. It is evident 
that in card play certain kinds of card (e.g. those of the same suit) tend to collect together 
and the shuffling is apt to be somewhat perfunctory. Thus the condition of realisation of 
the hypergeomctrio distribution, that the selection is random, was probably violated. 


Example 12,2 ^ 

In some classical experiments on pea-breeding Mendel obtained the following frequencies 
for different kinds of seeds in crosses from plants with round yellow seeds and wrinkled 
green seeds :— 




Observed 

Theoretical 

Round and yellow . 

... 

. 315 

312 75 

Wrinkled and yellow 

. • 

. 101 

lOt-25 

Round and green . 

. 

, 108 

104 25 

W^rmklod and green 

- 

32 

31 75 


Total . 

. 550 

55(* 00 


On the Mendelian theory of inheritance the frequencies should be in proportion 9, 3, 3, 1 
and the theoretical frequencies are shown in the last column. 

We find 


% 


(2-25)2 (3-25)2 (3-75)2 ^ ^. 75)2 
312-75 ' 17)4-25 104”25 ‘ ^4-75" 


0-4700. 


The number of degrees of freedom v ^ 3. The probability of obtaining the value of y} 
or greater is seen to be between 0*9 and 0*95. There is thus notliing in the value of to 
lead us to reject the Mendelian hypothesis. 


12 .11.'^ Consider now a table of the tyjie of Table 12.4, which shows the frequencies 
of a number of men according to eye colour and hair colour. If, on some hypothesis as 
to the relationship between ^yo and hair colour we determine theoretical frequencies in 
the body of the table, leaving the row and border columns unchanged, then there are 
a number of linear constraints on these frequencies. 

In fact, if in such a table there are r rows and s columns it will be found that only 
(r — l)(,^^ — 1) cells can be filled up arbitrarily. There are rs cells altogether ; but the fact 
that the rows and columns must add to assigned totals imposes r + 6* constraints. These, 
however, are not independent, for the sum of the border column frequencies is equal to 
that of the border row frequencies and thus there are only r + ^ 1 independent linear 

constraints. Hence — (r + — 1) (r — 1)(6‘ — 1) cells are independent and this is r, 

the number of degrees of freedom associated with such a table. 

Example 12,3 

In Table 12.4 suppose that eye and hair colour are independent. Then the expected 

xv 

frequency in any cell with a row total x and a column total y will be ' , where n is the 



soo 


THE z»-I)ISTRIBUnON 
TABLE 12.4 


Distribution of 6800 Males according to Colour of Eye and Hair. 
(Ammon, 2hir Anthropologie der Badener.) 

Hair colour 



Fair. 

Brown. 

Black. 

Red. 

Totals. 

Blue .... 

1768 

807 

189 

47 

2811 

Grey or Green 

946 

1387 

746 

53 

3132 

Brown .... 

115 

1 

438 

288 

16 

857 

Totals 

2829 

2632 

1223 

116 

6800 


total frequency. For instance, the expected number of men with fair hair and blue eyes is 
2811 X 2829 

——— = 1169. The theoretical frequencies obtained in this way are— 


Hence 



Fair 

Brown 

Black 

Rod 

Blue . 

. 1169 

1088 

506 

48 0 

Grey or Grtjon . 

. 1303 

1212 

563 

53 4 

Brown 

. . 357 

332 

154 

14-6 


^ 1169 

= 1075-2. 

r == (4 - 1)(3 - 1) = 6. 


The value of x* is very improbable, P being less than 0-000,001, We accordingly reject 
the hypothesis of independence and conclude that hair colour and eye colour are associated. 


12.12."^ It is useful to note that x^ “ay be put into a form which is sometimes more 
convenient for calculation. We have 

= - 221 + n 

= r** - M.( 12 . 28 ) 

A 

When the A’s are not integers it is easier to work with this formula, the squaring of the 
larger numbers I involving less aritlimetio than the squaring of the smaller but non-integral 
numbers i — A. 


12*13. In the foregoing examples the theoretical frequencies A were calculated without 
reference to the experimental data other than totals which merely subjected them to linear 




301 


EXAMPLES OF THE USE OF THE z*-I>ISTRIBUTION 

comtraints and hence preserved the Type III distribution of There also arises the 
much more difficult caee in which certain parameters necessary for the determination of 
the theoretical frequencies have to be determined from the data themselves. Suppose, 
for example, we attempt to represent a frequency-distribution by a normal curve. We 
have then to decide on the mean and variance of this curve, and they can, as a rule, only 
be estimated from the data themselves. The question then arises, what happens to the 
;^*-distribution if, instead of the unknown parameters leading to the theoretical frequencies X, 
we use estimates leading to the estimated theoretical frequenciesi say, A'? That is to say, 
how does the distribution of the statistic 

r *' - ’■ 1 ' 

* A 

compare with that of 

.(12.29) 

This problem has not yet been completely solved. The nearest approach to a solution 
has been reached by R. A. Fisher (1924), who showed that xf is distributed in the Type III 
form, provided that 

(а) the sample is large and that each cell-frequency is large ; 

(б) that the number of degrees of freedom is reduced by unity for every parameter 
estimated ; 

(c) that the principle of estimation involved is such as to minimise x^> This is 
equivalent, for large samples, to taking a maximum likelihood estimate. 

Departing from our usual practice, we shall have at this stage to state this result 
without proof. It cannot be adequately discussed until we have dealt with the principles 
of estimation in the second volume. 

Example 12.4 

The following table shows the distribution of 12 dice thrown 26,306 times, a 5 or 6 being 
reckoned a success. We have encountered these data before in Table 5.1. 

TABLE 12.5 


Distribution of 12 Dice throum 26,306 times, a 5 or 6 being reckoriexi a Success. 


Number of 

Observed 

Frequency of 

Frequency of 

Successes. 

Frequency. 

20,306(5 + 

26,306(0-6623 -f 0-3377)i». 

0. 

185 

203 

187 

1. 

1,149 

1,217 

1,146 

2. 

3,265 

3,345 

3,215 

3. 

5,475 

5,576 

5,465 

4. 

6,114 

6,273 

6,269 

5.1 

1 5,J94 

5,018 

5,115 

6. 

' 3,007 

2,927 

3,043 

7. 

1 1,331 

1,254 

1,330 

8. 

1 403 

392 

424 

9. 

J0.5 

87 

96 

10 and over .... 

18 

14 

16 

Totals 

j 20,306 

1 

26,306 

! 26,306 

' i 










302 


THE ;e» DISTRIBUTION 


The third column shows the iErequencies if the dice were perfect, that is the frequenoies 
of the binomial law 26,306 (| + We find, in the usual way, 

^2 _ (20a - 185)* 


203 


4- etc. === 35-941 


10 . 


P is very small, less than 0*000,1 and we conclude that the hypothesis is to be rejected, 
i.e. that the dice were not perfect or that something was wrong with the sampling. In 
this particular case great care was in fact taken in rolUng the dice and the balance lies in 
favour of rejecting the hypothesis that they were entirely unbiassed. 

Let us then reconsider the data. If the dice are biassed, what is the true probability 
of getting a 5 or 6 ? This we must estimate from the data and it has already been seen 
that a maximum likelihood estimate is the mean number of successes in the sample itself. 
This is found to be 0-3377 and the last column in Table 12.5 shows the frequencies 
26,306(0-6623 + 0-3377)i*. The agreement with observation is evidently closer and we 
find now = 8-201. v is now 9, for we have estimated one parameter. P is now about 
0-5, so that the observed frequencies are in quite good accord with theory. 

12.14. Since I® fh® sum of the squares of a certain number of independent normal 
variates each with zero mean and unit variance, a number of different values oi x^ may 
be added together and will be distributed in the Type III form with a number of degrees 
of freedom equal to the sum of the individual numbers. This result enables us to combine 
the results of a set of experiments so as to determine the probabilit y of the whole set taken 
together. For example, Table 12.6 shows the data for inoculation against cholera on a 
certain tea estate. 


TABLE 12.6 


Inocuhtion against Cholera on a certain Tea Estate, 




Attacked. 

Not Attacked. 

ToiAl S. 

Inoculated 

- - - 

431 

5 

436 

Not inoculatt'd 

• - • ^ 

201 1 

» 1 

300 

Totals 

722 

14 

736 


We find x^ — 3-27, v = 1 and P, from Appendix Table 7, about 0-071. This is 
small, but not small enough to reject the hypothesis, particularly when we note that 
the theoretical frequencies in the not-attacked column are far from large. 

The results for six such estates were 



P 

V 

3 27 

0071 

1 

9 34 

0 0022 

1 

6 08 

0-014 

1 

2 51 

Oil 

1 

6*61 

0-018 

1 

1-59 

0-21 

X 

28 40 


6 


Here only one value of P is less than 0-01, and we might be inclined to doubt the reality 




THE 2x2 BIVARIATE TABLE 


303 


of association between inoculation and immunity. The sum of i8> however, 28*40, and 
V 6, for which values w© find P < 0*000,1; so that together the results are significant. 

Ths 2x2 Bivariate Table 

12.15» W© now return to a point which has been mentioned incidentally. If the 
theoretical frequencies in cells arc small, the Type III distribution will hold only as an 
approximation depending on how closely the binomial distributions in individual cells are 
adequately represented by normal distributions. For some problems we can overcome the 
difficulty by grouping small frequencies, as has been done above in Example 12.1. But 
such a process sacrifices information and cannot alwnys be carried out, e.g. in a 2 x 2 
bivariate table. 

Consider in the first place the symmetrical binomial (J + The second column 
in the following table shows the probability P that the number of successes in the first 
column will be at most attained (for the smaller of the pair) or attained or exceeded (for the 
larger of the pair). 


SuCCOSBOS 

P 

Pi 

A 

0, 10 

0 0010 

0 0008 

00022 

1, 9 

0 0108 

0 oor>7 

00134 

2 , H 

0 0547 

0-0290 

0-0509 

3, 7 

0 1719 

0 1030 

0-1714 

4, 6 

0 3770 

0 2635 

0-3759 


If we regard this frequency as that of a single coll (r = 1) wo should, for the corre¬ 
sponding distribution, have the positive half of the normal curve. , The values correspond* 
ing to 42 , 32 , 22 , 12 ) are shown in the third column as Pi. They may be obtained 

from Appendix Tables 0 and 7, e.g. for the last term wo have == f = 0*4, P = 0*52709, 
Pi == J of this 0-2035. 

The correspondence between P and Pj is evidently not very good. Wo can, however, 
im])rove it considerably by a correction due to Yates (1934). The distribution of Is 
continuous, whereas that of the binomial is not. To bring the two into comparability 
we really should consider the binomial frequency at the value r as spread over the range 
r I to r *f |. For example, for a deviation 3 corresponding to 8 successes we should 

2 ( 2 * 5)2 

take a deviation 2-5, giving x^ ™ ^ ~ 2*5 instead of 3. The values given by the 

o 

corrected x‘^ shown as Pg above. The agreement between P, and P is evidently a great 
improvement on that between Pi and P. 

When the theoretical proportion in a cell is not the binomial distribution is skew, 
and there do not appear to be any simple corrections to compensate for this effect. The 
continuity correction will, however, result in an improvement if the theoretical frequency 
is near 1 and is probably best made in all circumstances. 

12.16. The 2x2 table may also be dealt with by exact methods. Consider in fact 
the table 


a 

b 

1 

(X t b 

c 

d 

c + d 

a ”1" c 

b -td 

»* 

I 



304 


THE ;k*-DISTRIBUTION 


J£ the two variates are independent, the number of ways in Which a table witii auoh 
marginal totals can be constructed from the n sample members is 

( w \/ n \ _ w! _ to! 

a + c/\o + b) {a + c)! (6 + d)\ (a + 6)! (c -f d)!' 

The number of ways in which the body of the array can be completed is 

to ! 

a\ 6! c! d!' 

Consequently the probability of the distribution of the table is 

(u "4" c)! (6 -f- d)! {Of "4“ ^)! (c ~4“ d)! 


to! a! b\ c\ d! 

Thus the successive probabilities for d = 0, 1, 2, . 
hypergeometric series 

J5^{“ (c “4" d), — (6 “4“ d), O’ 


(12.30) 


. are the successive terms in the 

d + 1, 1}. . . . (12.31) 

Example 12,5 (from P. Yates, 1934, quoting data by M. Heilman). 

The following table shows the number of children classified according to the nature 
of the teeth and type of feeding. 




Normal Twth. 

Maloccluded Twth. 

Totals. 

Breast fed 

4 

10 

20 

Bottle fed 

1 

1 

21 

1 

22 ' 

Totals 

5 

37 

! 

42 


From (12.30) the probability of obtaining no normal breast-fed children if the attributes 
are independent is (a = 0) 


5! 37! 20! 22! 
42! 26ro! 5! 17! 


- 0 03096. 


The probabilities of obtaining 1,2,... cliildren are obtained by multiplying successively 

. 5 X 20 4 X 19 3 X 18 , , . „ 

by —— - - and so on, and are as follows :— 

1 X 18 2 X 19 3 X 20 


Number of Nonnal 
Breast-fed (yhildron. 
0 
1 
2 

3 

4 
6 


Probability. 


00310 
01720 
0-3440 
0 3096 
012.53 
0-0182 


lOOOl 


Thus the probability of getting four or more normal breast-fed children is 0-1436, and 






NOTES AND REFERENCES 


305 


we conclude that there is nothing to reject the hs^pothesis that breast feeding exerts no 
effect on the condition of the teeth. Had we used the the ordinary way we 

should have found P — 0*0012, less than half the true value. The continuity correction 
maizes a great improvement, giving P = 0*1427. 


NOTES AND REFERENCES 

The ^^-distribution, though known to Helmert in 1876, was rediscovered and applied 
to statistical problems by Karl Pearson in 1900. In 1922 Yule and Fisher gave respec¬ 
tively experimental and theoretical evidence for what is now accepted as the correct method 
of determining the number of degrees of freedom in a bivariate table ; but Pearson himself 
seems never to have acknowledged the soundness of this method, and some pai)ers written 
between 1920 and 1930 on this subject are controversial and therefore not to be accepted 
uncritically. 

For the use of the distribution in testing hypotheses when parent parameters have 
to be estimated, see Fisher (1924). For the exact test in a 2 x 2 table and the continuity 
correction, see Yates (1934). 

More recently, Cochran (1936) and Haldane (1937a, 19376, 1939a, 19396) have dis¬ 
cussed the distribution of hi bivariate tables when some hypothetical frequencies are 
small. See also Haldane, Biometrika (1945), 33, 231 and 234. 

Cochran, W. G. (1936), ''The ;^®-distribution for the binomial and Poisson scries, with 
small expectations,” Aim, Eugen,, Loyal., 7, 207. 

- (1938). “Note on J. B. S. Haldane’s paper |j937a below],” Biometrika, 29, 407. 

Fisher, R. A. (1922), “ On the interpretation of x^ from contingency tables and the calcu¬ 
lation of P, ” Jour. Roy. SMist. Soc., 85, 87. 

-(1924), “ The conditions under which x^ measures the discrepancy between observation 

and hypothesis,” Jour. Roy. Statist. Soc., 87, 442. 

Garwood, F. (1936), "Fiducial limits for the Poisson distribution,” Biometrika, 28, 437. 
Haldane, J. B. S. (1937a), " The exact value of the moments of the distribution of used 
as a test of goodness of fit when expectations are small,” Biometrika, 29, 133. 

-(19376), " The first six moments of x^ for an ^^-fold table with n degrees of freedom 

when some expectations are small,” Biometrika, 29, 389. 

- (1937c), " The approximate normalisation of a class of frequency distributions,” 

Biometrika, 29, 392, 

- (1939a), " Corrections to formulae in papers on the moments of x^'^ Biometrika, 31, 

220 . 

- (19396), " The mean and variance of x^ when used as a test of homogeneity when 

samples are small,” Biometrika, 31, 346. 

-(1939c), " The cumulants and moments of the binomial distribution and the cumulants 

of x^ for an n x 2 fold table,” Biometrika, 31, 392. 

Wilson, E. B., and Hilferty, M. M. (1931), "The distribution of chi-square,” Nat. Acad. 
ScL, 17, 694. 

Yates, F. (1934), " Contingency tables involving small numbers and the x^ tost,” Supp. 
Jour, Roy, Statist, Soc., 1 , 217. 

Yule, G. U. (1922), " An application of the x^ method to association and contingency tables 
with experimental illustrations,” Jour. Roy, Statist. Soc., 85, 95. 

A.S. —VOL. I. 


X 



306 


THE ;K*-DISTRIBUTIOiN 
EXERCISES 


12.1. Bv the method of 12.8 show that 




is approximately normally distributed with unit variance about zero mean. 

(Haldane, 19Blc.) 


12.2* Use the ;^*-distribution to show that the distribution of digits from telephone 
directories (Table 1.4) could not in all probability have arisen by random sampling from 
a population in which each of the ten digits occurred with the same frequency. 


12.3. Show that ,for a 2 x ti bivariate table v = ~ 1 and 


X 


2 



n.n 




+ a 




where ai^ are the frequencies in the ^‘th column and ni, n, are the border sums of the 
two rows. 


12.4. Show that if v is even 


-L' 

P -62 f-r dx 

J X 

= 4 - . 


2 . 4.6 ’ (v 2 )/ 


and hence that the values of P for given be derived from tables of the Poisson 

exponential limit. 


12.5. Show that in a 2 x 2 table whose frequencies are 

a I b 


c 


d 


2 — d)(cwZ — 6c)^ 

^ (u “f- ^)(^ ■f' ^)(^ ”f‘ d)(a 4" c) 

the theoretical frequencies being those obtained on the hypothesis that the two variates 
are independent. 

12.6.^ An experiment gives on hypothe,sis H = 9, v = 8. When repeated it gives 
the same result. Show that the two taken together do not give the same confidence in 
H as either taken separately. 



EXEECISES 


307 


12.7f (Bata from Report on the SpahUnger JEucperimmts in Northern Ireland, 1931-- 
1934, H*M. Stationerj Office, 1985.) In experiments on the immunisation of cattle from 
tuberculosis the following results were secured:— 



Died of Tuberculosis 
or very seriously 
affected. 

Unaffected or only 
slightly affected. 

Totals. 

Inoculated with vaccine , 

6 

. ..-- . 

13 

19 

Not inoculated or inoculated with 
control media. 

__ _ _ _ 

8 

i 

1 

3 

1 

n 

Totai^ 

14 

16 

1 

30 


Show that for this table, on the hypothesis that inoculation and susceptibility to 
tuberculosis are independent, = 4*75, P = 0 029 ; with a correction for continuity 
the corresponding probability is 0-072 ; and that by the exact method of 12.15 P = 0*070, 

12.8. (Data from Yule, Jour. Anihrop, Inst,, 1906. 36, 325.) 

Sixteen pieces of photographic paper were printed down to different depths of colour 
from nearly white to a very deep blackish-brown. Small scraps were cut from each sheet 
and pasted on cards, two scraps on each card one above the other, combining scraps from 
the several sheets in all possible ways, so that there were 256 cards in the pack. Twenty 
observers then went through the pack inde(>endently, each one naming each tint either 
light,'’ medium ” or dark.” 

Tlie following table shows the name assigned to each of the two pieces of paper :— 


Name assigned to 
Lower Tint. 

Name assigned to Upper Tint. 

Totals. 

Light. 

Medium. 

Dark. 

Light 

860 

671 

580 

2001 

j 

Medium 

618 

593 

455 

( 

1666 

Dark 

j 540 

456 

457 1 

1463 

Totai^ 

2008 

j 1620 

1492 

6120 


Show that there is a significant association between the name assigned to one piece 
and the name assigned to the other. 






CHAPTER IS 


ASSOCIATION AND CONTINGENCY 

13,1 • This and the next three chapters deal with the relationship between two or 
more variables. We shall consider populations, the members of which each bear one of 
each of several different sets of qualities or one value of each of several different variables, 
and shall discuss the measurement of the relationship among the qualities or variables 
in the populations. The corresponding questions of sampling will also fall for consideration. 
We may denote this branch of the theory by the general name of Theory of Dependence, 

A ssociation 

13.2. Consider in the first instance a population classified according to whether 
each member bears or does not bear an attribute A. Tlie presence of the attribute we 
may denote by A and the absence by a. We shall assume that each member must either 
be an A or an a, so that a — not-^ and A not-a. 

Suppose that each member of the population is classified according to two attributes 
A and B, Each may then be one of four kinds, AB, olB and a/i. For example, if the 
attributes are the possession of blue eyes {A) and the possession of male sex (j8), we shall 
have the four possible classes AB — blue-eyed males, A^i ~ blue-eyed females, a/i not- 
bhie-eyed males, a/3 = not-blue-eyed females. Denoting the number in any class by the 
letters appropriate to that class in ordinary round brackets, we may then specify the 
population in the tabular form :— 



BV 

not-B'h 

Totals. 

A'8 

(AB) 

(All) 

(.*1) 


(««) 

1 

(<^fl) 


Totals 

(B) 

(IB 

N 


or, more simply, by 


a 

b 

fj -I" b 






1 

c 

d 

C 4 f/ I 

a 4- C ; 

j b (i 

! 


where a = {AB), etc. Here N is the total number in the population. 

308 


- (13 1) 


. (13.2) 



ASSOCIATION 


309 


13.3. If there is no relationship between the attribute A and the attribute B, that 
is to say if the possession of A is irrelevant to the possession of B, then there must be 
the same proportion of A’s among the ^’s as among the /3’s. We thus define two attributes 
to be independent if 

- '42 (13 31 

{B) tjt) .' 

- .(i«) 


(AB) 

_{AP) 


{B) 

(>) * 


a 

_ b _ 

d -f “ 6 

a + c 

b -f d 

N ‘ 

f the following is 

true 

c 

d 

r + d 

a + c 

~ b -\-d “ 

'iV“ 

a 

c 

a + c 

d b 

c rf 

'N~ 

h 

d 

b + d 

a b 

r +d 

^'N 


-r-f-;” Tf-. 

a + c 0 + d N 

-®,=— .(13.6) 

a i- b c + d N ‘ ' ' 

I) d b + d 

a + b r + d N ' ' 

These are derivable by simple algebra from (13.4) and, in words, will be found to express 
the same fundamental fact that the proportion of members bearing an attribute X is the 
same among the l"’s as among the not-T’s. It also follows that 

(.H. 10 . ,,3 8 , 

with three similar equations in b, c and d. 

If now in any given table 


(a + h){a f c) 


or, in th(' alternative notation, 


{AB)> 


N 

{Am 


wo shall say that A and B are positively Eissociated. Per contra, if 

. . . 

they are said to be negatively associated or disassociated. 


. (13.9) 

(13.10) 

(13.11) 


Examine ISA 

Association between inoculation against cholera and exemption from attack. (Green¬ 
wood and Yule (1015), Proc, Roy, Soc, Medkiiie, 8, 113.) The following table shows 818 



Not Attacked. 

Attacked. 

Totals. 

Inoculatod .... 

270 

3 

279 

Not*inoculalcd ... 

1 

1.-... 1 

473 

60 

639 

Totals ! 

749 

69 

818 


810 ASSOCIATION AND CONTINGENCY 

cases classified according to inoculation against cholera (attoibute A) and freedom from 
attack (attribute B), 

If tbe attributes were independent the frequency in the inoculated-not-attacked class 
279 X 749 

would be —^ - 5 — s=s 265. The observed frequency is greater than this and hence in- 

olo 

oculation is positively associated with exemption from attack. 


13,4. The reader will recognise in this example a type of 2 x 2 table which was 
discussed in connection with the ;^*-distribution. In fact, if the data are considered as.a 
sample there arises at once the question how far the positive association, which certainly 
exists in the sample, is indicative of real association in the parent population. The 
distribution, as shown in the previous chapter, provides an objective method of forming 
a judgment on this matter, itself, however, does not provide an adequate measure of 
the intensity of the association. Altogether apart from sampling questions, we sometimes 
wish to compare the strength of associations in different populations or between different 
attributes, and some coefficients proposed for the purpose will now be considered. 


13.5. The more obvious desiderata in a coefficient measuring association are (a) 
that it shall vanish when the attributes are independent; ( 6 ) that it shall be + 1 when 
there is complete positive association and — 1 when there is complete negative association ; 
(c) that it should increase as the frequencies proceed from dissociation to association. As 
to this latter point, consider the difference between observed and independence ” 
frequencies in the cell corresponding to (AB), viz.:— 

<5 = (AB) - .(13.12) 


Since the border frequencies are constant it is evident that the difference in any cell between 
observed and independence ” frequencies is ± d and thus d determines uniquely the 
departure from independence. We may interpret condition (c) as meaning that our co¬ 
efficient should increase with d. It may be noted that 

5 =r a — ^ ^ 

a + 6 4 - c 4 “ d 


— he 

N ~ . 

Following Yule (1900, 1912) we define the coefficient of association Q by the equation 

ad — be _ Nd 
ad +- be ad + be' 


Q 


(13.13) 


(13.14) 


It is zero if the attributes are independent, for then 5 == 0 . It can equal + 1 only if be = 0 , 
in which case there is complete association (either no a’s are B’s or no . 4 's are / 9 ’r), and —■ 1 
only if ad — 0 , in which case there is complete disassociation. Furthermore, Q increases 

with <5, for if e = —5 
ad 

then Q = 

and is negative, as is so that ^ is positive. 
de da dd 



ASSOCIATION 


311 


A somewhat similar coefi&cient, also due to Yule, is the coefficient of colligation 



. (13.15) 


This also satisfies our conditions, as the reader may verify for himself. 


13.6. Yet a third coefficient, which will be shown below to be related to is 

iVd 


+ V(A)(a)(m) 

(ad — he) 


+ {(« + b)(a f c)(h "+ rf){c + d) 

This is evidently zero when 5 = 0 and increases with d. If F = 1 we have 
(a + b)(a 4- r)(6 + d){c + d) — (ad — bc)^, 

giving 

Aahcd -1" a^(6c + bd + cd) 4* + od + cd) 

+ c\ab + ad + bd) + d'\ac + a6 + be) = 0. 


. (13.16) 


Since no frequency can be negative this can only vanish if at least two of a, 6, c, d are zero. 
If the frequencies in the same row and column vanish the case is purely nugatory. We 
have then only to consider a = 0, d = 0 or 5 = 0, c = 0. In the first case F = 1, in the 
second F = — 1. It cannot lie outside these limits. 


13.7. It will be observed that whereas F is unity only if two frequencies in the 2x2 
table vanish, Q and Y are unity if only one frequency vanishes. This raises a point in 
connection with the definition of complete association. We shall say that association is 
complete if all A's are B’s, notwithstanding that all B'h are not A'b. If all dumb men 
are deaf there is complete association between dumbness and deafness, however many 
deaf men there are who are not dumb. The coefficient F is unity only if all A’s 
are B s and all J5’s are A’s, a condition which we could, if so desired, describe as absolute 
association. 

It is necessary to point out in this connection that statistical association is different 
from association in the colloquial sense. In current speech we say that A and B are 
associated if they occur together fairly often ; but in statistics they are associal/cd only 
if A occurs relatively more or less frequently among the /Fs than among the not-JB’s. If 
00 poT cent, of smokers have poor digestions we cannot say that smoking and poor digestion 
are associated until it is shown that less than 00 x)er cent, of non-smokers have poor 
digestions. 


Example 13.2 

Ck)nsider again the data of Example 13.1. For the various coefficients we have 

(276 X 66) --- (3 X 473) 


Q 


(276 X 66) + (3 X 473) 


0*8555 



S12 


ASSOCIATION AND CONTINGENCY 


^ 3 X 473 
276 X 66 
^ 3 X 473 
276 X 66 


= 03636 


F = - (3 X 473) ^ 

'V{279 X 539 X 749 X 69) 

These values are, as might be expected, different, although they all refer to the same Intensity 
of association. Comparisons, however, naturally fall to be made between values of co¬ 
efficients of the same type, and the fact that different types give different values does not 
affect their usefulness or the comparability of members of any one type. 

13.8. The methods of Chapter 9 may be used to give the standard errors of the three 
coefficients based on material obtained by random sampling. 

We recall that for any of the four frequencies a, 6, c, d we have results such as 

vara="^7~> . . . .(13.17) 

N 


eov (a, b) 


(13.18) 


The first is merely another way of writing the expression for the variance of a bmomiaL 
The second follows from 


Then for e = 
frequency c?, 


whence 


var (a + 6) = var a + var fc + 2 cov (a, ?>). 
we have, VTiting A for the differential to avoid confusion with the 


Ab _ Ah Ac 
'7 ^ b ‘c* 


^vara 


Substitution from (13.17) and (13.18) gives 

var f = - + -P 

\a b 


_ AA 
d* 

cov(of, f>)| 

ab r 


1.. J 

c ' d 


(13.19) 


and hence 


giving 


In a similar way we have 


var Q 


4Q- var B 

4 U 


var Y — 


C-OTYH'+lt-i). 

4 \a b c d) 

(1 - Y^yix 1 1 i\ 

"-ir“U + 6 + c + rfj- 


(13.20) 


(13.21) 



PARTIAL ASSOCIATION 


313 


The sampling varianne of V may be found similarly but involves rather more lengthy 
algebra. Yule *(1912) gives the result 


var V 



V2 + (F 4- iF») 


((ia 4“ ^)(u 4“ ^)(h r ■4" d) 


( (o 4" b)[c -f- d) (o 4" c)(6 4" d) J J 


(13.22) 


In applying these formulae it is, as usual in large-sample theory, assumed that the 
observed frequencies may be used instead of theoretical frequencies in the sampling 
variances. 


Example. J3.3 

Reverting to Example 13.2, we have for the standard error of Q 


VC 


= ^ /{ L + h + ± + 

2 V \27(> 3 473 ^ 60/ 

0-0798. 

The coeffioient Q thus probably lies in the range 0-856 ± 0-239 in the population from 
Avhich these data were derived, assuming of course that the samphng was random. The 
upper limit here, of course, must bo unity. 


Partial Afisociation 

13.9. The eoeflieients described above measure the dependence of two attributes in 
the statistical sense, but in order to decide whether such dependence has any causal signifi¬ 
cance it is often necessary to consider associations in sub-populations. Suppose, for 
examjde, a jiositivo association is noticed between inoculation and exemption from attack. 
It is natural to infer that the inoculation confers exemption, but this is not necessarily so. 
It might be that the people who are inoculated are drawn largely from the richer classes, 
who live in better hygienic conditions and are therefore bettor equipped to resist attack 
or less exposed to risk. In other words, the association of A and B might be due to the 
association of both with a third attribute (J (wealth). 

Now it is clear that this explanation would not hold if the hygienic circumstances were 
constant in the population. If we then consider the association of A and B in the sub- 
populations (C) (well-to-do classes) and (y) (poorer classes) and find that it persists, the 
explanation is rejected. I'urthermore, if the association in (y) was weaker than that 
in (C), there would be some indication that hygienic conditions are related to exemption 
from attack, though not constituting the only factor concerned. 


13.10. Associations in sub-populations are called partial associations. Analogously 
to (13.0), A and B are said to be positively associated in the j)opulation of O’s if 


(ABC) > 


(A G){BC) 
\C) ’ 


. (13.23) 



314 


ASSOOUTION AND CONTINGENCY 


where (ABC) repiesents the number of members bearing the attributes A, B and 0 ; and 
so on. We may also define coefficients of partial association, colligation, etc., such as 


^ _ (ABC)(oi^) - 

Vab.c «= (Asc)(apC) + (A^C)(xBC)' 


. (18.24) 


which is derived from (13.14) by adding C to all the symbols representing the frequencies. 


Example 13.4 

Galton’s “ Natural Inheritance ” gives some particulars, for 78 families containing not 
less than six brothers or sisters, of eye*colour in parent and child. Denoting a light-eyed 
child by A, a light-eyed parent hy B and a light-eyed grandparent by (7, we trace every 
possible line of descent and record whether a light-eyed ohM has light-eyed parent and 
grandparent, the number of such being denoted by (ABC) and so on. The symbol (A^y), 
for example, denotes the number of light-eyed children whose parents and grandparents 
have not light eyes. The eight possible classes are 


(ABC) = 1928 
(ABy) = 596 
(AtiC) = 552 
(A/?y) = 508 


(aBC) = 303 
(aBy) — 225 
(a/SG) = 395 
(a/?y) =501 


The first question we discuss is ; does there exist any association between parent and 
oflFspring with regard to eye-colour ? We consider both the grandparent-parent group 
(association of B’b and C’s) and the parent-child group (association of A’a and it’s). We 
have, for the former : 

Proportion of light-eyed among children of light-eyed parents, 


(BC) 

'(C) 


2231 

3178 


= 70-2 per cent. 


Proportion of light-eyed among children of not-light-eyed parents, 

(By) _ 821 

"(y) ~ 1830 


= 44-9 per cent.; 


and for the latter, analogously, 


(AB) ^ 2624 
(Bj ~ 3052 
(AfS) _ 1060 
~(pj ~ 1^6 


= 82-7 per cent. 


= 54-2 per cent. 


Frequencies such as (A/?) are calculable direct from the eight classes given above, e.g. 
(A/?) = (A/3(7) -4- (A,^y) = 552 -f 508 = 1060. 

Evidently there is some positive association between parent and offspring in regard to 
eye-colour. 

Consider now the relationship between eye-colours of grandparents and grandchildren. 
We have: 

Proportion of light-eyed among grandchildren of light-eyed grandparents 

^ (AC) ^ 2480 

(C)' sns 


78-0 per cent. 



PARTIAL ASSOCIATION 


315 


Proportion of light-eyed among grandchildren of not-light-eyed grandparents 



1830 


60*3 per cent. 


The association between grandparents and grandchildren is also positive. 
In tabular form the data are:— 


Grandparents. 



C 

y 

Totals. 

B 

2231 

821 

3052 


947 1 

1 

1009 

1956 

TOTAki^ 

3178 

1830 

5008 


Parents. 



B 

p. 

Totals. 

A 

2524 

1060 

3584 

a 

528 

896 

1424 

Totals 

3052 

1956 

5008 


Grandparents. 



G 

y 

Totals. 

A 

2480 

1104 

3584 

a 

698 1 

726 

1424 

Totals 

3178 

1S30 

1 

500H 


The coefficients of association and colligation Q and Y are 


Q Y 

Grandparents—parents . . . , . 0 487 0 2fi0 

Parents --children ...... 0 003 0 335 

Grandparents—grandchildren . . . . 0 401 0 209 


Now the question arises : is the resemblance between grandparent and grandchild due 
merely to that between grandparent and parent, parent and child ^ To investigate this, 
we must consider the associations of grandparent and grandchild in the sub-populations 
“ parents light-eyed ’’ and “ parents not-light-eyed/' that is, the associations of A and C 
in and We have :— 


Parents lAghi-eyed 


Proportion of light-eyed amongst grandchildren of light-eyed grandparents 


{ABC) _ 1928 
'JBCY ~ 2231 


— 86*4 per cent. 





316 ASSOCIATION AND CONTINGENCY 

Proportion of light-eyed amongst grandchildren of not-iight-eyed grandparents 

72-6 per cent. 


{ABy ) 


696 

821 


Parents not Light-eyed 

Propoition of light-eyed amongst the grandchildren of light-eyed grandparents 

{AfiC) 5.52 „ 

“ W " 547 " 

Proportion of light-eyed amongst the grandchildren of not-light-eyed grandparents 

_ {APy) _ 508 




1009 


50*3 per cent. 


In both cases the partial association is well marked and positive. The association 
between grandparents and grandchildren cannot, then, be due wholly to the associations 
between grandparents and parents, parents and children. There is ancestral heredity, as 
it is called, as well as parental heredity. The relevant four-fold tables are;— 


Parents light-eyed 
Clrandparenls. 



BC 

By 

Totals. 

AB 

1928 1 696 

2521 

fxB 

303 j 22.5 

528 

Totals 

2231 

821 

3052 


Parf'nts not-Iight-eyed, 
(n’aiidpjir(‘nts. 



fiC 

1 th’ 

TotaI.s, 

Aft 

552 

! 508 

1 

I0t)0 

GLtf 

395 

j 501 

i 

S9() j 

t Totals 

1 

947 

1009 

1 

1956 


The coefficients of association and colligation are :— 

=0-412 (?.„.,== 0159 

--- 0-216 = 0 080 


13 . 11 . If there are p different attributes under consideration the number of partial 
associations can become very large, even for moderate p. For example, wo can choose 

two in ^2^ ways and consider their associations in all the possible sub-populations of the 

other (p — 2), which are seen to be 3'’“-* in number. Thus there are associations. 

In practice, however, we need only consider a few of them. 









PARTIAL ASSOCIATION 


317 


On© retmlt in this connection is worth noticing. We have, generalising S of equation 
(13:i3) 

6ABC + j(ARy) - 


^{AB) 

" ^AB ~ 
~ ^AB ~ 


N 


N 


liAmnn 

- 4. 

^ - -I- - 


- T} 


(<- 

.S A 


. (13.25) 


0 and 


. (13.26) 


If, then, A and B are independent in both (C) and (y), d 

N 

i.e. they are not independent in the population as a whole unless C is independent of ^4 or i? 
or both in that population. 

This peculiar result indicates that illusory associations may arise when two populations 
(C) and (y) are amal^^amated, or that real associations may be obscured. If A and C, 
B and (7, are associate 1 we have, from (13.25), 

. N 

^AB — 


{C)(y) 


^BC + ^aIBC + ^AB.y 


so that even if A and B are independent in (C) and (y) thej” will appear as associated in 
the two together. Again, if A and B are associated positively in (C) and negatively in (y), 
d i;y may be zero, that is to say, they may appear as independent in the '^hole population. 

Example W.J 

Consider the case in wliich a number of patients are treated for a disease and there 
is noted the number of recoveries. Denoting A by recovery, a by not-recovery, B by 
treatment, (i by not-treatment, suppose the frequencies are 



, a 

^ P 

Totals. 

A 

j 100 

, 200 

1 

1 

300 

a 

50 

100 

150 

Totals 

' J 50 

1 300 

450 


Here ( 44 i 5 ) — 100 so that the attributes are independent. So far as can be 

seen, treatment exerts no effect on recovery. 




318 


ASSOCIATION AND CONTINGENCY 


Denoting male sex by C and female sex by y, suppose the frequencies among t o^loe 
and females are 


Mq.lee 



O 

fiC 

Totals. 

AC 

80 

. .,, . 

100 

160 

olC 

40 

80 

120 

Totaxb 

120 

180 1 

300 


Females 



By 

Py . 

Totals. 

Ay 

20 

100 

120 

ay 

10 

20 

30 

i 

Totals 

30 

120 

150 


In the male group we now have 

^AB,C (80 X 80) + (100 X 40) 
and in the female group 

QAB.y = - 0 - 429 . 

Thus among the males treatment is positively associated with recovery, and among the 
females negatively associated. The apparent independence in the two together is due to 
the cancelling of these associations in the sub-populations. 


Contingency 

13.12. We now turn to the more general case in which a population is divided into 
a number of categories At , . » Ap instead of simply dichotomised into two, A and 
not-i4. If there is a second classification into • B,^ the frequencies may bo 

arranged in the form :— 




A^ 

. . . 

A^ 

1 

1 Totals. 


' 

{A,n,) 


(ApBi) 

(B,) 


1 ... 

\ ... 1 

1 ... 1 

1 ... 

(ApB^) 

1 [ ] ] 

! 

(Bj) 

. 

B, 


{A^B,) 


{ApB^) 

(B,) 

Totai-s 

(xl.) 

(-^a) 


(Af) 

N 





1 






CONTINGENCY 


319 


This is known as a contingency table. We have already encountered an example in 
Table 12.4. Ordinary bivariate frequency tables can, of course, be regarded as contingency 
tables, but there is a difference; in the bivariate table the order of rows and columns is 
detetmined by the vaiiate-values, whereas in the contingency table the order of rows and 
columns is, in general, arbitrary. 

In (13.27) the frequency in the ith column and jth row is denoted by As 

in the case of the 2x2 table we write 

dfj = (AtBj) - .(13.28) 

and define the attributes as independent if every d is zero. We have: 



X 


{AiKBiY 


. (13.30) ^ 


y® 

is sometimes called the square contingency and the mean square contingency. 


13.13. We have already seen in Chapter 12 how be used to test the hypo¬ 

thesis that the observed frequencies could have arisen by random sampling from a popu¬ 
lation in which the attributes were independent. We now proceed to consider the con¬ 
struction of measures of dependence. 

Evidently = 0 if and only if each d — 0. Thus x^ vanishes if and only if the 
attributes are independent in the observed population. Furthermore, slb x^ becomes 
larger the observed frequencies deviate more and more from the “ independence ” fre¬ 
quencies ; it thus provides some sort of measure of the strength of relationship in con¬ 
tingency tables. For example, in the 2x2 table we have 


y 2 

F*(equation (13.16)) = ^, .... (13.31) 

which illustrates the relationship between V and x*- 

X* itself, however, does not constitute a very useful coefficient since it may increase 
without limit. Following Karl Pearson we may put 


C = 


/ 

S/N +x^ 


. (13.32) 


and call C Pearson’s coefficient of contingency. Even this has its limitations. It vanishes, 
as it should, when there is complete independence ; but in general it cannot attain unity. 
Consider, for example, a. t x t table in which the diagonals (AiB^) are of frequency oq and 



:{20 ASSOCIATION AND CONTINGENCY 


all other oompartmentH are zero. 
We then have 


Obviously no greater degree oC dependence is possible, 
, *<* 





and C — 

If < = 5, for example, the maximum value of C is 0.894. 

To remedy tliis effect Tschuprow proposed the coefficient 

T ^ I . - . - . _1‘. 

\nV(P - l)(q - 1)J 


. (13.33) 


. (13.34) 


Tliis can attain unity when p — q but it is still not clear how it behaves when p and 
q are unequal. 


Example 13.6 


TABLE 13.1 


Distribution of Schoolchildren according to Intelligence and Standard of Clothing. 
(Erom W. H. Gilby (1911), Biomelrika, 8, 9t.) 



A and B 

c 

D 

E 

F ' 

(} 

Toi Al s 

V<^ry well clad . 

33 

48 

113 

209 

194 

39 

630 

1 

Well clad .... 

41 

100 

202 

2.'>5 

138 1 

15 

751 

Poor but passable . 

1 39 , 

58 

1 

70 1 

01 

33 j 

4 

265 

1 

Very badly clatl 

1 17 

13 1 

i 

22 

10 

I 

10 

1 

1 

73 

Totals 

1 1 

1 130 

219 

1 

407 

5.35 

376 1 

1 

59 

1725 


The above table shows the distribution of 172.'5 schoolchildren who were classified 
(1) according to their standard of clothing and (2) according to their intelligence, the 
standards in the latter case being A — mentally deficient, B — slow and dull, C — dull, 
D — slow but intelligent, E — fairly intelligent, F — distinctly capable, G <=- very able. 
Required to discuss whether there is any association between standards of clothing and 
intelligence. 

We note in the first place that a table of this kind could, theoretically, be discussed 




CONTINGENCY 


321 


by oaimdering all the possible 2x2 oomparisons to be extracted from it; e.g. for the 
comers of the table we hare 



A and B 

0 

Totals. 

Very well clad . , 

33 

39 

72 

Very badly clad ... 

17 

1 

18 

Totai^ 

' 1 
60 

40 

90 


Here, for example, 54 per cent, of the very well clad were very able, but only 5 per cent, 
of the very badly clad. However, what we really require is not, a series of individual 
comparisons of this kind but a general comparison over the whole table, and it is for such 
purposes that the coefficient of contingency is designed. 

Wo then proceed to work out the “ independence ’’ frequencies, e.g. that in the top 

left-hand corner of the table is = 47*930. The contribution to y* from this 

1725 ^ 


^ . (14-930)2 

compartment is then 


4*651. It will be found that thesumrof the contributions 


from the 24 compartments is 174*92. We then have 


C = 



174*92 

1725174*92 


0*303, 


indicating a considerable degree of association. For the Tschuprow coefficient we have 


T = 




174*92 

1725i/15 


0*162. 


There is evidently some general relationship between the two attributes, thougk not 
a very strong one. The reader may verify for himself by using the test that the values 
of G and T are significant, i.e. could not have arisen by sampling from independent attributes 
in all probability. 

13.14. The sampling variance of the coefficient of contingency is difficult to arrive 
at in virtue of sheer algebraical complexity ; and it is not clear how far the use of a standard 
error is legitimate in this connection. Reference may be made for the formulae to K. Pearson 
(1915a) and Kondo (1929). For the even more complicated question of partial contingency 
see K. Pearson (19156). 


13.15. In concluding this chapter wo point out that all the measures of association 
and contingency discussed therein in no way rely on the possibility of the measurement 
of attributes on a variate-scale, or even on the possibility of arranging them in order. 
Rearrangement of rows and columns in the two-way tables does not affect the values of 
the coefficients.* In the next chapter we shall consider the relationship between variates 


♦ Except that it may change the sign of a coefficient of association. This is equivalent to a slight 
change of standpoint in what is n^garded as a positive association—for example, jx>sitivo association 
between fair hair and blue eyes is equivalent to negative association between fair hair and not-blue eyes. 
A.S.—^VOL. I. Y 






S22 


ASSOCIATION AND CONTINGENCY 


and oertaJn coefficients based on the assumption that the attribute ciassifioatlpiUi axe 
made according to the divisions of a variate-scale. These coefficients (tetrachoric r, biaerial 
f), etc.) have been used as measures of association, but they are essentially different in 
character from those discussed in this chapter. The reader who refers to memoirs written 
on this subject between 1900 and 1920 will find it useful to remember this fact. 

f 


NOTES AND REFERENCES 

The fundamental memoir on association of attributes is that of Yule (1900), who 
introduced the coefficient Q in it. In a later paper (1912) Yule reviewed the whole subject 
and proposed the coefficient denoted in this chapter by Y. This memoir contained some 
criticisms of Karl Pearson’s coefficient now known as tetrachoric r (of. Chapter 14) and 
evoked a reply from Pearson and Heron (1913) which is riemarkable for having missed 
the point over more pages (173) than perhaps any other memoir in statistical history. 

Pearson’s coefficient of contingency C was introduced in 1904. Corrections to this 
coefficient were subsequently proposed, being based on the notion of an underlying variate 
(K. Pearson, 1913), For references to the other coefficients proposed on this basis, see 
Chapter 14. 

Kondo, T. (1920), “*On the standard error of the mean square contingency,” Biometrikaf 
21, 376. 

Pearson, K. (1904), “ On the theory of contingency and its relation to association and 
normal correlation,” Drapers^ Company Resmreh Memoirs, Biometric Series I, 
Dulau and Co., London. 

- (1913), “ On the measurement of the influence of broad categories on correlation,” 

Biometrika, 9, 116. 

-and Heron, D. (1913), ** On theories of association,” Biometrika, 9, 159. 

- (1915a), ‘‘ On the probable error of a cpeflicient of mean square contingency,” Bio- 

meirika, 10, 590. 

- (19166), On the general theory of multiple contingency, with special reference to 

partial contingency,” Biomeirika, 11, 145. 

Yule, G. U. (1900), “ On the association of attributes in statistics,” F%iL Trans., A, 194, 257. 

- (1912), “ On the methods of measuring the association between two attributes,” 

Jour, Roy, StatisU Soc,, 75, 579. 


EXERCISER 

13.1. Show that the coefficient of association is greater in absolute value than the 
coefficient of colligation, except when both are zero or unity in absolute value. 

13.2. Show that for a contingency table with a constant number of rows and columns 
the Pearson coefficient of contingency C is equal to the Tschuprow coefficient T for two 

y2 2 

values of one of which is zero ; that for ^ between these values C > T, and for ^ 
greater than the higher value T > C. 



EXERCISES 


323 


13<3. The following table shows 68 lobelia plants classified according to whether they 
were cross- or self-fertilised and above or below average height. 


--^- 

Above Average. 

Below Average 

Totals. 

Crosa-fertilised . . . 

17 

17 

34 

Self-fertilised .... 

1 

i 

22 

34 

Totals 

29 

1 

1 

39 

1 68 


Show that Y = 0*150 and that this is not significant of association if these data are 
a random sample from lobelia plants generally. 

13.4.^ In the hair- and eye-colour of Table 12.4 show that C = 0-37 and T — 0*25. 


13.5.^ In a paper discussing whether laterality of hand is associated with laterality 
of eye (measured by astigmatism, acuity of vision, etc.) T. L. Woo obtained the following 
results {Biometrika, 20a, pp. 79-148);— 

Ocular Laterality for General Astigmatism. 


Show that laterality of eye is only slightly associated with laterality of hand, and that 
the association is not significant. 


13.6.^ 


In the notation of 13.5 show that Q ~ 


27 

1 + 7 »' 






CHAPTER 14 


PRODUCT-MOMENT CORRELATION 


14.1* At the end of Chapter 1 there were given a few examples of bivariate frequency 
tables. We now proceed to consider such tables in greater detail and to discuss methods 
of measuring the dependence of the two variates represented in them. It is, of course, 
possible to treat the problem by the methods of the previous chapter and regard the tables 
as contingency tables ; but when data are classified according to a numerical variable more 
exact methods are available in an important class of cases. 

The types of bivariate distribution arising in practice are not so easy to classify as the 
univariate types. Table 1.15 on page 20, showing the distribution of beans according to 
length and breadth, and Table 1.25 on page 27 showing the number of cows according to 
age and milk yield, evidently correspond more or less to the unimodal univariate distribution, 
for not only the border frequencies but the frequencies in individual rows and columns are 
of the unimodal tyi>e. Biometric distributions are often of this character. On the other 
hand, Table 1.26 on page 28, showing discount rates and bank reserves, has the border 
column of the unimodal ty|)e and the border row of the J-shaped t>q)e. In Tables 14.1 to 
14.3 are given three more examples of the kind of material encountered in practice. 
Table 14.1 shows the distribution of a number of persons according to age and highest 
audible pitch ; Table 14.2 the distribution of registration districts according to proportion 
of male births and total number of births ; and Table 14.3 shows the distribution of sons 
according to stature of son and stature of father. 




324 



TABLE 14.1 

Distrthidion of 3379 Persons according to Ago and Highest Avdiblo Pitch 
(From Y. Koga and G. M. Morant, Biometriha, 15, 346.) 

The numbers m brackete are explained in Example 14.1, p. 331. 

As^e vears ___ 









826 


PRODUCT^MOMENT COBRELAttON 


TABLE 14*2 

Shomng the Number of Jtegisfration Districts in England and Wales exhibiting (1) a given 
Proportion of Male Births, (2) a given Total Number of Births during the Decade 1881-90, 

(The Data as to Total Births and Numbers of Male and Female Births from Decennial 
Supplement to Beport of the Registrar-GmeraL Table from H, D, Vigor and G. tJ. Yule, 
Jour, Boy, Stat, Soc,, 69, 1906.) 










(1) Proportion of Male Births per 1000 of all Births. 










1 


w 

1- 

4. 

1- 

•<# 

4. 

f: 

t 



1 


J 


s 

1 

1 

lA 

tA 

lA 


1 

vA 

1 

s 

1 

o 

i 

rM 

*2 

lA 

1 

tn 

t- 

s 


1 

u> 

Totaxs. 

0- 4 

1 


1 



2 

1 

2 

2 

4 

8 

9 

18 

10 

12 

21 

14 

12 

0 

5 

3 

1 

1 

2 


1 

1 

149 

4- 8 

— 


— 



— 

1 


2 

2 

5 

7 

20 

27 

29 

42 

86 

18 

10 

4 

— 

— 

— 

1 



... 

204 

8- J2 




— 

— 

— 

— 

— 

— 

— 

2 

1 

5 

15 

17 

18 

16 

7 

4 

1 

— 

— 

— 


.... 



8ft 

12- 16 

— 

— 

— 


— 


— 

— 

— 

— 

— 

— 

1 

10 

8 


8 

« 

1 

1 

1 

— 

— 

_ 

— 

— 

... 

48 

1ft- 20 



— 

— 

— 


— 


--- 


— 

-- 


5 

6 

9 

4 

1 

1 

-— 




-- 




26 

20- 24 

— 

— 

— 

— 

— 





,— 

— 


— 

1 

5 

4 

3 

1 

1 

— 


-- 

— 

— 

— 

_ 


15 

24- 28 



— 

—- 

— 





— 

— 

— 

2 

3 

6 

3 

1 

1 

.— 

— 

—, 

— 

— 


— 

—, 

_ 

15 

28- 32 

— 

— 

— 


- 

— 

— 

— 

— 

— 

— 

1 

— 

2 

ft 

2 

1 


— 

— 

— 

— 

— 

— 

.w- 

— 

— 

12 

82- 8« 

— 

— 

— 

— 

— 

— 

— 

— 

— 


— 


— 

3 

3 


1 

— 

— 




— 

— 


— 

... 

7 

3«~ 40 


— 



— 

— 

— 

— 

— 

— 


— 

— 

— 

4 

— 

2 


—. 

— 

— 



—. 

— 

— 

— 

ft 

40- 44 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

3 

4 

1 

— 

— 

— 

— 

— 

— 


— 


— 

8 

44- 48 





— 



— 



— « 


— 

2 

5 


1 

1 

- 


— 

— 


. _ 


.... 

_ 

11 

48- 52 
62- f>« 

_ 

— 

_ 

j - 

— 


! — 

— 

— 

— 

— 


— 

2 

2 

6 

5 

3 

— 

z 

— 

— 

_ 

— 

: — 

— 

— 

— 

— 

11 

7 

ftft- 60 


— 

— 1 

— 

— 

— 

- 1 

- 

— 

— 

— 

— 

— 

1 

.— 

0 


— 






— 


— 1 


3 

60- 64 


— 

— 

«. 

— 

1 — 

i — 

-- 

_ - ' 

1 

, 1 

1 ~ 1 


— 

3 

1 

1 

— 



— ' 

— 

1 — 


— 

— 

— j 

— 

5 

64- 68 


—. 

1 


- - 


1 — 

— 

— 

- 



-- 

i — 

1 

2 

— 

— 1 

— 

“ ! 

h“ 

! “ 

— 

_ 



' — 

3 

««- 72 

— 

— 

1 

1 

— 

— 

1 

- 

— 

- 1 

1 _ 1 


— 

2 

— 

1| 

i — 

-1 

— 

_ ; 

[ 



- 

— 

— j 

... 

3 

72- 76 
76- 80 



__ j 



_ 

1 

_ 

^ — 

1 — 1 

= i 

_ 1 

-1 

— 

li 

3 


_ ' 

1 

— 

I 




“■ 

!Z 

z 

z 

4 

80- 84 

— 

— 

j — 

' — 1 

- 

' — 


— 

1 — 

— 1 

-1 

- j 

— 

— 

1 

— 

— 1 



— 1 

— 

— 

— 

.. 

— 

— 

— 

1 

84- 88 

— 

— 

— 


- ■ 



— 1 

i — 

1 1 

— j 

- - 

.— 

—. 

2 

— j 

— 1 

~ 1 

. 1 


— 

— 

-w. 




— 

*2 

88- 92 


— 


- 1 

; — ■ 




1 — 

— 1 


— 1 

— 1 

— 

1 ! 

1 



"* 



: 

—. 


.... 


— 

2 

92- 9ft 
96-100 


1 

— j 


1 


1 

1 

! _ 

1 

' 1 

Z1 

j 

_ 

:: 1 

2 

_ 


— i 

- i 

i_ 

z 


—! 

— 

z 

— 1 

2 

100- 04 

— 

1 — 

— 


— 

— 

—, 

1 

' Tl 

‘ —" i 


— ' 

— 1 

— 1 

. — 




- - 1 

1 



— 

— ^ 




... 

104- 08 

—• 

•— 

— 

-j 

—. 

— 

—. 

—' 1 



— ‘ 


— 


11 


— 


— , 


Z 


— 

— 1 




1 

148- 52 

-i 



■“ i 

"" 



_1 

! - j 

1_1 

::.j 



_1 





_! 

—- 


z~”! 


-— 

— 1 

— 



1 

totam 

t 

1 

- 

1 


- 

2 

2 

2 1 

4 1 

G j 

15 1 

is! 

40 j 

98 

1 

125 1 

1 

129 * 

88 ! 


20 

11 j 

4 1 

1 j 

1 

3 

““j 

1 

1 

032 


14.2• In accordance with the definitions of the previous chapter we may say that 
two variables are independent in a bivariate table if the observed frequency in the j/th 

row and ith column (A^Bj), is equal to —In such a case, for any two rows j and k 


the frequencies will be proportional to (A^)(By) and (Ai){Bf ^); so that the distribution in 
any row is similar and similarly situated to that in any other row. It will, for example, 
have the same mean and variance. Similarly for the columns. 

The measures of dependence we shall consider are related to the extent to which row 
and column means and variances differ among themselves. If the variates are independent 
all row means are equal and all column means are equal. The converse is not true in general, 
but becomes so in an important special case when the distribution is normal. 










REGRESSION 


327 


TABLE 14.3 

of 1078 Sms according to (1) Stature of Father and (2) Stature of Son: One 
or Two Sons only of each Father, 

(From Karl Pearson and Alice Lee, Biometrika, 2 (1903), 415.) 

Measurements in inches. Note that where a height falls on the border-line of an interval, 
one-half of the individual is assigned to each contiguous interval. 









(1) Stature of Father 








Totals 

s 

! 

A 

lA 

s 

lA 

W 

2 

O 

89 

»A 

ua 

Z 

•A 

»A 

wi 

I’. 

s 

•A 

i 

CO 

lA* 

«3» 

1 

S 

>A 

it 

' 

lA 

O'! 

C- 

lA 

CO 

s 

tii 

»A 

»A 

«A 

*r 

»A 

6d-5-fl0'5 




— 

0*5 

0*5 

1 











2 

60 5-ttl 5 

..... 


— 


06 

— 


— 

1 

_ 

_ 

_ 


_ 

__ 

_ 

_ 

1*6 

015-H2 6 

..... 

0*25 

0*25 

— 

0 5 

1 

0*25 

0*25 

0 5 

05 

_ 

_ 


_ 

_ 

_ 

_ 

3*5 

62‘5-635 

— 

0*25 

0-25 

2*26 

2*26 

2 

4 

6 

2 76 

1 26 


0*26 

0*25 



_ 

_ 

20*6 

63 &-64 5 

1 

— 

1*6 

8*76 

3 

4*25 

8 

9*25 

3 

1 25 

1*5 

0 75 

1 26 

— 


— 

_ 

88 5 

64*5-06 5 

2 

1 

0*6 

2 

8*26 

9*6 

13*6 

10*76 

7*5 

5*6 

3 5 

2*5 


_ 

_ 

_ 

_ 

61 5 

65*5-66 5 

—• 

0*6 

1 

2 26 

6 25 

9*5 

10 

16*76 

17 6 

16 

5*26 

2 

2*5 

1 




89*5 

66*5-67'5 

— 

1*5 

8 

4*75 

3*6 

13*75 

19*7.5 

26 5 

26*75 

19*6 

12*5 

13*75 

3 25 

0 5 

1 


— 

148 

67 .5-08-5 

— 

— 

1-6 

2 

7*5 

10 

10*25 

24*26 

31 5 

23 6 

20*5 

13*25 i 

8*5 

95 

2*25 

_ 


173*6 

68-5-69*6 

— 

.— 

1 


5*26 

6 

12 76 

18*25 

16 

24 

29 

21*5 

10 

3 5 

2*25 

_ 

1 

149*5 

60 6-70*6 

— 


— 


1 

26 

6*76 

18*76 

11*75 

19*6 

22 6 

19*5 

14*5 

6*25 

8*5 

1*6 

1 

128 

70 5-71-5 


— 

— 

— 

—, 

3*25 

5 1 

8*75 

10 75 

19 

14*75 

20 75 

10 75 

8 

5 

1 

1 

108 

71*5-72-5 

— 

[ —• 

— 

— 

— 

0 25 

3 1 

1 26 ! 

7 

7*76 

10 75 

11*26 

10 

8*5 

2 76 

0 5 


63 

72*6-73-6 




.— 

— 

— 

0 75 1 

0*76 

2*5 

7*5 

6*5 

6 

7 5 

6 25 

3*25 

05 

0 5 

42 

78 5 74*5 


—. 

— 

— 

1 

— 

1 5 

i*6 1 


5 26 

2*26 

2*5 

6 5 

3 26 

3 25 i 

... 

I ^ 

29 

74*5-75*5 

i— 

— 

—■ 

— 

— 

— 

— 

— 

— 1 

1 

2 

— 

2*5 

0*76 

176 

0*,5 


8*5 

75-6- 70*5 

.— 


— 

— 

— 


— 

—. 1 

1 — i 

] 25 

0*25 

_i 

05 

1 

1 

.. i 

— 

4 

76*6-77*5 

— 

— 

— 

— i 


1 — 

1 — 

— 

— 

1 26 

0*25 

1 ‘ 


— 

15 

— 


4 

77 5-78*5 

— 

— 

— 

— 

— 

— 

— 

— 1 

1 — 

- , 

1 

1 

.— 

0*25 

0 76 

— 

— 

3 

78 5-79*5 

— 

— 

“*"*■* 


— 

'— 


— 1 

1 — 

“ ^ 

— 

— 

— 

0 25 

0*25 

— 

— 

0*5 

Totals 

3 

3*5 1 

! 

“ i 

17 

33 6 

61 5 j 

95-5 

142 1 

1 

137 6 

jl54 

141*5 

jllO 

78 

49 

28 6 

4 

6 5 

1078 


Regression 

14.3. The rows and columns will be referred to by the general term ” arrays ” and 
wo shall consider the two variates x and y, x being taken to vary horizontally, i.e. in rows, 
and y vertically, i.e. in columns. Consider then the means of arrays. Take two axes 
OX and OY at right angles representing the variates x and y. On this frame of reference 
plot the points whose abscissae are the centres of the x-intervals and whose ordinates are 
the means of the corresponding distributions of y in the columns centred at the appropriate 
x'b. Similarly, plot the means of the ar-distributions against the centres of the corres¬ 
ponding ^-intervals. (In practice it is useful to distinguish the two, the means of x's being 
denoted by small circles and those of y's by crosses.) Fig. 14.1 shows such a diagram 
for the data* of Table 14.3 and Fig. 14.2 for the data of Table 14.2. 

The means of arrays will in general lie more or less closely round smooth curves. For 
example, in Fig. 14.1 they lie approximately on straight lines, whereas in Fig. 14.2 one set of 
means certainly does not. Such curves are called regression curves and their equations with 
respect to OX and 0 Y are called regression equations. If the lines are straight the regression 
is said to be linear; if not, curvilinear or skew. 

To put these geometrically expressed ideas in analytical language, suppose the mean 






Son’s Stabjn. (Inches). 


S28 PRODUCT-MOMENT CORRELATION 


of X in the array centred at ia £<. Then the points (f<, y^) inay be represented by a fane* 
tional equation— 


* =*/(y) .... 

which is the regression equation of x on y. If the regression is linear, 

X ^ fy + x. 

There will also be an equation - 


the regression of y on x. 


H *= ?(*). 


(U.1) 


(14.2) 


FaffHfs Mature (inehts). 



Pro^rtion of Mala Births per lOOOblitfis. 



Fio. 14.1.—Regression Lines of Data of Table 14.3. 
Means of rows shown by circles, and corresponding 
regression line by JRR ; means of columns shown by 
crosses, and corresponding regression line by CC, 


Fio. 14.2.—Regression Lines of Data of Table 14.2. 
Moons of rows shown by circles, and corresponding 
regression line by Bli ; moans of columns shown by 
crosses, and regrejlsion line by CC\ 


In this chapter we shall mainly be concerned with the case in which regressions are 
linear or very nearly so. 

14.4. In an observed distribution the means of arrays will not as a rulp lie exactly 
on straight lines, or indeed on any simple curves, although they may be very near to doing 
so. The question then arises: if the regression is " approximately ” linear, what is the best 
line to take as the regression line ? The question may be answered by an appeal to the 
method of least squares. The regression of x on y, say ® = /ffy + «, will be determined by 
minimising the sum 


V = 2W,(x, - - a)*, 


. (14.3) 





REGRESSION 


329 


th« aoxamation extending over all ^-intervals. Here Nj represents the frequency in the 
»thi row, and note that is equal to the total frequency N times the mean of z for 

the whole distribution. 

From (14.3) we have, for the minimal values of a and 

2SNf(Xf — Py^ — (x)—0 . . . , (14.4) 


- ^ = = o .... (u.s) 

Now choose the origin at the means of x and y for the distribution. Then ZNf = 0 
and jTi == 0 and hence, from (14.4) 

a = 0. 

From (14.6) we then have 

Z(Ni y, Xt) - p SiNi = 0. 

Now since the origin has been chosen at the mean of x and y, ZiNiy^x^ — N cov {x, y) 
and Z{Ni y^*) = N var y. Hence we have 




cov {x, y) 
var y 


(14.6) 


The equation of the regression of x on y, taking X and F to be current co-ordinates, 
will then be 

.(14.7) 

var y 

Referred to an arbitrary origin for which the means of x, y are f, y, the equation is 


var y 


(14.8) 


Similarly we find for the regression of y on x 


(y -9) 


cov _ 

var X 


(14.9) 


Equations (14.8) and (14.9) are fundamental. If the regressions are exactly linear they 
give the regression equations ; if not, they give the “ best ” straight regression lines in the 
sense of the method of least squares. 


The Coefficient of Product-moment Correlation 

cov (x v) cov lx 

14.5. The coefficients - ^ ^nd - are called regression coefficients and 

var y var x 

will be denoted by /Si and /S, respectively.* 

Wo now define 

P - w.f 

= covjljyrt. 

(var X var 

♦There is little danger of confusion between this notation and the use of fix, /J, to indicate 
measures of skewness and kurtosis. The two rarely occur in the same context* 




330 


PRODUCT-MOMENT CORRELATION 


p ia called the coefficient of product-moment corr^tion or briefly tiie oorrelaMon coefficient. 
It provides an important measure of the relation between two variates for whi^ the 
regressions are approximately linear. In this expression the square root is to have positive 
sign. 

Let us note in the first place that p cannot be greater than unity in absolute value. 
For we have, taking an origin at the means and summing for all pairs of values of x, y over 
the population, 

r(a: - iJxy)* = I{x*) - 2^^1{xy) + P\Zm 

^ 1 Z(x*) ^ Z{x*) I 

= Z(ic*)(l - p»).. (U.ll) 

Thus 1 — p* cannot be negative. 

Furthermore, if p === ± 1, S{x — = 0 and hence every x — = 0. Thus the 

variates are linearly related by the equation X — j8iF = 0, lfp=0 the regression 
equations become X = 0, F = 0, for then oov (r, y) — 0. Hence the means of arrays are 
the same for all arrays. 


I4.6. If p == + 1 we say that the variates are perfectly positively correlated; if 
0 < p < 1, that they are positively correlated ; if p = 0, that they are uncorrelated ; if 
0 > p > — 1, that they are negatively correlated ; and if p = — 1, that they are perfectly 
negatively correlated. 

** Unoorrelated ” is not the same thing as “ independent.” If the variates are inde¬ 
pendent they are uncorrelated, but not vice-versa. Table 14.2 and Fig. 14.2 illustrate this 
point. The regression lines, as shown in the figure, are close to X = 0 , F = 0 and the 
correlation is, in fact, very small (—0*014). But the variates are obviously far from 
independence and if the data are grouped, in columns up to 494*5, by single columns up 
to 521*5, and over 521*5, and by rows 8~11, 12-13, 14-15, 16-17, 18-19, 20-21, 22-23, 
24-25, 26-27, 28 and over, the coeflScient of contingency is 0*47. 


14.7. The calculation of the correlation coefiScient in numerical examples requires 
that of the means and variances of the two variates and their covariance. The last is the 
only new type appearing, the others being calculable from border frequencies in the manner 
exemplified in Chapter 3. 

Taking an arbitrary origin, we have, if the means of x and y are a and b, 

N cov {x, y) = Z(x — a)(y — b) 

I = Hixy) — aX(y) — bZ{x) + Nab 

N(xy) — Nab 

cov {x, y) = —ab. . . . . (14.12) 

Thus we may, as in the univariate case, take an arbitrary origin for arithmetical convenience, 
calculate the product sum N{xy) and determine the covariance by the use of (14.12). The 



THE COEFFICIENT OF PRODUCT-MOMENT CORRELATION 


381 


oalonlaticm of the product sum is exemplified below. As seen in 3.30, no Sheppard’s 
corrections are required for the first product moment. 


Emmple 14,1 


To find the correlation coefficient and regression line® for the data of Table 14.1. 

We find the means and variances from the border columns in the usual way. 
An arbitrary mean is taken for x (age) at the centre of the interval 20-6 years and for 
y (highest pitch) at the centre of the interval 19- thousand vibrations which may be taken 
as 19,995. We find 


2:(x) *= 2,604, 
Up) = - 708, 

S(y*)^ 8894, 


Piipc) = 0-770,642 

(i\{y) = _ 0-209,529 
/U2(*) — 13-348,229 

fi^iy) = 2-604,904. 


To find the product sum Z(xy) we require the product xy for each non-zero cell of the table. 
These products are shown in brackets in Table 14.1. Then we have, reading the table 
from left to right and from top to bottom. 


Z{xy) = (1 X 0) -f (1 X - 14) -f (1 X - 84) 

-f- (1 X 18) -t- (1 X 12) 4- (3 X 6) -f etc. 
= - 12,535, 


whence 


Thus 


cov (x, y) = 


~ 12,.535 _ 
- 3-.548,205 


(0-770,642)( -0-209,629) 


cov(a;, .v)_ 


a substantial negative correlation. The highest audible pitch decreases with increasing age. 
We also find 

var y 


^2 


cov (.r, y) 
var X 


0*26,58. 


The regression equations, for the units of the table and with our arbitrary means, are then 

X - 0*7706 -= - 1*417(7 + 0*2095) 

Y + 0*2095 =- - 0*2658{X - 0*7706). 


Example 14.2 

The following device is often useful in calculating product moments. We recall that 
21{xy) == I(x + y)2 — Y:(x^) — 

i:(x^) + X(y2) Z(x — y)2. 

Thus we may find E{xy) from either L\x + y)^ or Y{x — j/)*, and these quantities are often 
more convenient to calculate. 

For example, in the preceding example we note that a? + y is constant down the 
diagonals running from the bottom right-hand to the top left-hand comer of the table. 
Taking » + y to be zero in the cell centred at a: = 20*5- and y = 19-, we may, in our 



m 


PRODUCT-MOMENT CORRELATION 


units, take it to be + 1 in the oell 23-6-, 19-, and -1 in 17*6-, 19-, and ao on. If we 
sum up the diagcmals we get— 


X +y. 

Sum. 

X +y. 

Sum. 


1 

4 

124 

-8 

1 

5 

112 

-7 

5 

6 

90 


11 

7 

59 

-6 

20 

8 

38 

—4 

93 

9 

23 

-3 

207 

10 

21 

-2 

434 

11 

9 


594 

12 

9 

0 

637 

13 

^ 2 

1 

418 

14 

4 

2 

281 

15 

1 

a 

185 

1 

1 




i 

Total 

3379 


The total is 3379, which "provides a check on the work. We then find the sum of squares 
in the ordinary way, obtaining 

E{x + y)* - 31,216 = I{x^) + 2'(y*) + 2 £{xy). 

Then lixy) = |(31,216 - 47,392 - 8894) 

= — 12,535 as before. 

The rest of the calculation follows the same lines as in Example 14.1, 

We should have obtained the same result for I!{xy) if we had summed up the other 
diagonal. Which diagonal is chosen depends on how the frequencies lie in the table. 


Example 14.3 


In the foregoing the regression lines and the correlation coefficient were arrived at 
from a consideration of grouped frequencies in a bivariate table. We may, however, apply 
the same ideas to ungroui)ed material. There are no longer means of arrays, but the 
regresvsion lines are still to be interpreted as the lines of best fit to the N pairs of variate- 
values and the correlation coefficient as a measure of relationship between variates. 

Table 14.4 shows the yields of wheat and potatoes in 48 counties of England in 1936. 
In this particular case it is hardly worth while taking an arbitrary origin other than that 
given. We find (x = wheat, y == potatoes) 


Eix) = 758-0, 
Ely) = 291-1, 
Eixo) = 12,170-48, 
2(j/*) = 1791-03, 
Z:(xy) = 4612-64, 


fi\{x) = 15-791,667 
fii(y) = 6-064,583 
p,i(x) = 4-174,930 
//2(y) = 0-533,958 
Pii{x,y) = 0-326,888 


_ 0-326,888 

^ -v/(4-174,930 X d“633,958) 

= 0-2189. 

/J, = 0-6122, 

= 0-0783a 




THE COEFFICIENT OF PRODUCT-MOMENT CORRELATION 


333 


TABLE 14.4 


Yields of Wh&d and Poktims in 48 Counties in England in 1936. 


County. 

Wheat 
(cwta. 
per acre). 

Potatoes 
(tons 
per acre). 

County. 

Wlieat 
(cwts. 
per acre). 

Potatoes 
(tons 
per acre). 

Bedfordshire .... 

160 

63 

Northamptonshire . 

14-3 

4-9 

Huntingdonshire 

160 

6-6 

Peterborough .... 

14*4 

5*6 

Cambricigeshiro .... 

16-4 

61 

Buckinghamshire . 

16 2 

6 4 

Ely. 

205 

5 6 

Oxfordshire. 

141 

6*9 

Suffolk, West .... 

18-2 

6-9 

Warwickshire .... 

16-4 

5*6 

Suffolk, East .... 

16-3 

61 

Shropshire. 

16-5 

6*1 

Essex. 

17-7 

6-4 

Worcestershire .... 

142 

5-7 

Hertfordshire .... 

15 3 

6-3 

Gloucestershire .... 

13-2 

6*0 

Middlesex. 

16-5 

7-8 

Wiltshire. 

13-8 

6*6 

Norfolk. 

16-9 

8 3 

Herefordshire .... 

14-4 

6*2 

Lines. (Holland) 

21*8 

5*7 

Somersetshire .... 

1 134 

6 2 

„ (KoHteven) . 

15*5 

6-2 

Dorsetshire ..... 

1 11-2 

6*6 

„ (Lindsey) . . . 

15-8 

60 

Devonshire. 

14*4 

6*8 

Yorkshire (East Riding) . 

161 

6*1 

('Cornwall. 

1 16*4 

6*3 

Kent. 

18'5 

6 6 

Northumberland 

18-5 

6 3 

Surrey. 

12*7 

4-8 

Durham. 

164 

5*8 

Sussex, East .... 

15 7 

4-9 

Yorkshire (North Riding). 

170 

6*9 

Sussex, West .... 

14 3 I 

51 

„ (West Riding) . 

16-9 

6 6 

Berkshire. 

13-8 1 

6 5 

Cumberland. 

17 6 

6*8 

Hampshire. 

128 I 

6 7 

Westmorland .... 

16'8 

6*7 

Isle of Wight .... 

120 

6*5 

Lancashire. 

19-2 

7*2 

Nottinghainshiro 

15-6 i 

5 2 

Cheshire. 

17*7 

6*6 

T.»eice8tershire .... 

15 8 ! 

5 2 

Derbyshire. 

1 15*2 

6*4 

Rutland. 

106 

71 

Staffordshire .... 

j 

! 171 

6 3 




V. 


2 


5s 




Mteat Ytsld (csdifieracit). 


Fio* 14.3.—Scatter Diagram of the Data of Table 14.4, 














m 


PRODUCT-MOMENT CORRELATION 


The regreaEdon lines are 

X - 16-792 = 0-6122 (Y - 6-066) 

Y - 6-066 = 0-0783 (X - 16-792) 

The data are shown in a graphical form in Fig. 14.3. Corresponding to each pair of 
values (*, ff) there is plotted a point with those values as abscissa and ordinate. The 
totality of points famines what is known, for obvious reasons, as a scatter diagram. The 
two regression lines are also shown. 


The Bivariate Normal Distribiitioni 


14.8, The distribution 

_ 1 _ 

27Kyj<r,(l ~ p 


dF 




has already arisen (5.24) as the natural extension to two variates of the univariate normal 
distribution. In writing p in (14.13) we have anticipated a result which will now be proved, 
namely that p in that equation is in fact the correlation coefficient of the distribution. 
The characteristic function of (14.13) is 

u) == exp [— J(<®<Ti + 2utpalOt -I- 
whence var x — erf, var y = a\ 

cov (a:, y) = pcr,(Tj 

and the correlation coefficient is -tt sc = P. as stated. 

V{a\a\) 


The exponent in (14.13) may be written 

2(l-p®)\V<rt aj ^ at ^7 


1 




(14.14) 

(14.15) 


2(1 -p®)) 

Thus for any fixed y, a; is distributed normally about a mean given by 

— 

a I <T,’ 

and hence the means of the a;-arrays of infinite thinness lie on the line 

X ^pY 

Ol <72 

and this is the regression of x on y. Similarly the means of y-arrays lie on 

Y ^pX 

G 2 0\ 

the regression of y on x. Thus the regression lines of the bivariate normal surface are 
exactly linear. 

Furthermore, from (14.14) it is seen that the variance of an array of x for fixed y is 

0f(i - p®), 

i,e. is independent of y. Similarly the variance of an array of y is 

or|(l - p®) 

and is independent of x. 


(14.16) 


(14.17) 



THE BIVABIATE NORMAL DISTRIBUTION 


33S 


Thus the variance of x-uer&ytt is the same for aU arrays ; and so for y. Distributions 
for which this is true are called homosoedastic. 

If /> *= 0 the distribution becomes the product of two normal distributions and the 
variates are independent. ‘ Thus two xmcorrelated normal variates are independent. 

14.9. A criterion for linearity of regression for the general bivariate surface may 
be obtained in terms of moments or cumulants. Taking an origin at the means of the two 
variates we have,, if the regression of x on y is linear, 

X ==i8iy 

or, if /(x, y) is the frequency function, 

^00 ^00 

1 »/(*, y) dx = ^ly i /(x,y)dx. 

J —OO J ~CO 

Multiplying both sides by y*" and integrating over the range of y, we have 
f f y" */(*. y) dx dy = r [ y''+V(^. y) da: dy 

J —00 J —OO J —00 J —00 

or ^ p+i* • * • • (14.18) 

In particular = /Si /< 02 > 

BO that iWo 2 /^i.p =.(14.19) 

a condition on the central moments. Recalling the definition of bivariate cumulants 

we see that the univariate mean moments are related to cumulants in the same way as 
moments to cumulants and hence we also have 

p ^ ^0 p-hi . • . • . (14.20) 

. (14.21) 


^02 ^l,p 


Ml '^•O, p+1 


Similarly, if the regression of y on a: is hnear we shall have 


/^20 

^20 


20 i^p, 1 — Mil Mpii,o\ ^ 

20 ^11 '^p+1,0 J 


, (14.22) 


Under certain conditions these equations are sufficient for linearity of regression. For 
instance, if (14.18) is true for all p, then 


j dy II (x - /5iy) fix, y) dx| = 0. 


The expression in curly brackets is thus a function, not necessarily positive, whose moments 
vanish, and under certain general conditions this implies that the function itself vanishes, 
i.e. that 


r.. 


(* — fin/) fix, y) dx = 0 


or X = /Siy, 

so that the regression of x on y is linear. 



336 


PBODtJCJT-MOMENT OOREBLA1?ION 


Exampk 24.4 (from Wieksdl, Biometrika, 25* 136) 

Oonidder thd bivariate distribution of the squares of variates X, y vrhioh are distributed 
in the bivariate nomu^ form. The oharaoteristio function of these variates is {uqpor* 
tional to 


L L - S+SI ^ ^ 


Lj: 


— PVU^ OiOr, 


4- ^{1 — 2o|{l — />*)»}"] dxdy. 

Oj J 


This is proportional to (compare Exercise 1.5} 


;{1 - 2<^(1 - p >) it } 


A 

which, except for constants, reduces to 

{(1 — 2aiU)(l — 2a2iu) — 4:p^alal{it)(iu)}‘'^, 

This is the characteristic function, for when e = u = 0 it reduces to unity. 

Now the frequency-distribution represented by this function is evidently not normal; 
but its regressions are linear, for we have, taking logarithms, 

-2^ i log {(1 — 2alii){l — 2aliu) — ^p^crlal(it){iu)} 

giving, on identifying coefficients, 

= Up - i)!(2a?)^ 

= 2p^a\al{2aly~^p\ 

^11 === ; 

and hence 

^20 1 ^p + 1, 0* 


Sampling of Regression and Correlation Coefficients 

14.10s We now turn to consider the sampling problems associated with the coefficients 
of correlation and regression. 

/ First of all, as to standard errors. In Example 9.6 pn page 211 we have anticipated 
the determination of the sampling variance of the correlation coefficient itself*, obtaining 
the result for the normal case 

varr = ^(1 — p®)*.(14.23) 

Here, an usual, the Roman r is written for the value of p in the sample and n is the number 
in the sample. The result of (14,23) is not of great value, since the distribution of r tends 
to normality very slowly if p is not close to zero. It is probably as well not to use (14.23) 
unless n is greater than 500. 



SAMPLING OP BEGRESSION AND CORRELATION COEPPICIENTS 337 
In the manner of Chapter 9 we ha^e 


giving 


dbi _ Smii ^ dmao 

6 t ^$0 


var bi _ var m u ^ var mao _ mao) 

6| ^ m|o miiwiao 


(14.24) 


Substituting from (9.16) and (9.17), and writing the sampling values m instead of the parent 
//’s, we have 

A , mao 2mai \ nA 9R\ 

var 62 — — + —^ —-). . . . (14.26) 

n\mfi m|o mnWao/ 


or, for the normal case, on using the values of Example 3.15, 


61/ 1 - rn 

n\ r® / 


1 var y 


Similarly 




, Ivara:,, .. 
var 61 = - —(I — r^). 

n var y 


. (14.26) 


. (14.27) 


To our order of approximation it is indifferent whether we write 1 — or 1 — p* 
on the right-hand side of these equations. 


Example 14,5 

In the data of Table 14.3 (height of fathers and height of sons) we find r == + 0*51, 
for n 1078. Prom inspection of the table we see that the distribution is reasonably 
close to the normal type, and in this case n is large enough to justify the use of the standard 
error. We have then 


var r = 


{1 ->• ( 0 - 51 ) 2 }* 
r 078 ““ 


= 0000,508. 


Thus the standard error is about 0 023. The correlation is thus undoubtedly significant, 
if the data were obtained by random sampling. It is improbable that the parent corre¬ 
lation p lies outside the range 0*61 ± 0 05, and very improbable that it lies outside the 
range 0*51 ± 0*075. 


Estimates of Correlatim and Regression Coefficients in Normal Samples 

14.11. In large-sample theory the sample values of the correlation and regression 
coefficients may be taken as estimates of the population values in the usual way. They 
can also be used in smaU-sample theory and it may, in fact, be shown that they are estimates 
giving maximum likelihood to samples from a bivariate normal population. 

A.S.—VOL. I. z 



PRODUCT-MOMENT CORRELATION 


The joint probability of n sam|de values (x^, yi) . . , (x„, y„) from a bivariate normal 
population with means mi and mt is 


dF 


(2n)*aW(l - p»)s 




(x — mi)(y — m« ) 

0l(fi 


+ i:^?L^*yjJda:i dyi . . . dx„ dy„ (14.28) 
The likelihood function may then be written 

L Qc-J-exp r - M ~ipB + C)] 


(14.29) 


aT^a^{l — / 9*)2 

and thus, for the maximisation of log L we have 


1 dL 
L Bmi 


1 


|r(x - wi.) + r(y - mi)l = 0 

rf CT,CT, J 


giving 


2(1 -p*) I of 
- F{x — nil) — — ^{y — ”’■«) “ ^ 

Oi o, 


Similarly from -} '= 0 we have 

JL/ UTfly^ 

— — mi) — ^7(3^ — Wj) = 0 

Ol (T2 

Thus from (14.30) and (14.31), p not being unity in general, 

Z{x — fHi) = i7(t/ — Wa) ==" 0 


nil — - £{x) 

n 


(14.30) 


(14.31) 


(14.32) 


m, = - S(y) 

71 


SO that our estimates of the means are the means of the sample. 
We also find, equating log 1 

dcTi 

in <7i, (Tj and (1 — p^) respectively. 


d d d 

We also find, equating log L, ^ - log L and log L to zero and cancelling factors 

u(71 CO 2 dp 


-n+~ 
— n -f 


1 


i-p 
1 


JA ~pB) =0 
{-pB 4-C) - 0 


- n + —-- {A -2pB i-C) 

1 _ p 2 p 


(14.33) 


p == 


B 

n 


j ^ ^ ^ C 

n n 


whence 



BISTRIBimON OF COVARIANCE IN NORMAL CASE 


giving for the estizDhtes of erf, of and p 


of =* -I!(x — mi)* 

» 

o| = iir(y — m,)» 
n 

^ “ {S{x ~ m,)* i:{y - m.)*}*. 


. (14.34) 


which, on substituting the estimates of mi and m, given by (14.3?), become the sample 
variances and correlation coefficient. 


DistribvMon of Sample Means, Variances and Covariance in Normal Samples 

14 . 12 . In accordance with the result of the previous section we will take our estimates 
of the parameters mi, m„ of, o| and p to be the corresponding sample values x, «f, sf 
and r. The joint distribution of the sample values is given by (14.28) and it is remarkable 
that the exponent in that expression can 1^ expressed solely in terms of the five parameters 
and their estimates. We have, in fact, 


^(x — mi\* 


2p. 


^(x — m,)(y — m 


OjOg 




_(x — X + f — mi)* , . • . 

= z -- 1 - + two sunuar terms 

<^i 

^ ^(x x) ”*i) + four similar terms, 

of of 

the product terms vanishing because £(t — x) — X{y — y) — 0, 
= ^ j. -f tw'o similar terms 


= n 


(X - w,)* 




<4 




zprSiS^ 


+ 


(7n 


. (14.35) 


We proceed to find the joint distribution of the five statistics, and to do so require 
to express the frequency element (14.28) in terms of them. The non-differential part of 
that element is given by the exponential of (14.35). It remains to express in the requisite 
form the volume element dxi , . . dyi . . . dy^. 

Generalising the geometrical approach of Chapter 10, we may imagine a sample space 
of 2n dimensions, n for x and for y. The sample point may vary in the a;-space and the 
j/-space, but not independently so. In fact, if P represents the point (.Ti . . . x,^) in the 
.T-space and ^the point (j/i . . . yn) in the t/-space, and if Oi, O^are the points (x . • . i), 
(y ♦ • • y)» fh^n for any given r we have 


T ^ - ^){y - y) ^ r(x - x){y - y) _ 

ns^8^ mx - x)^^{y 

and thus r is the cosine of the angle, say 6, between POi and QO*, so that if P and r are 



340 


PRODUCT-MOMENT CORRELATION 


fixed Q varies on the cone in the ^-spaoe obtained by rotating OjQ such that the angle 
made' with OiP is cjonstant. 

The element in the a^-spaoe is proportional to Si”"* dsi dH as was seen in Example 10.6. 
For given r, j? and «, the point Q varies on the zone of the hypersphere of radius 
centre ^ and (» — 1) dimensions. This zone has radius sin 6= (1 — »•*)* and 

width d6 — and thus its content is proportional to {s,v'»(l — 

~^^~,that is, to «,"-*(! - r»)”. 

Thus the volume element may be written 

dv oc dsi dx ^ dy (1 — dr 

n-i 

r®) "i dr dx dy 


oz ^ dsi dSi (1 


(14.36) 


and the joint frequency element of the five variables is then proportional to 


exp 


2(1 


r/ 

- p*)Ll 


{x - m,)* „ (:r — mi)(^ 


O'! 


2p. 


(TiOa 


”>■%) I (y 


i J 


+ a-v, 






}] 


dv (14.37) 


This fundament/al result is due to R. A. Fisher (1915). 


14.13« One important property of (14.37) may be remarked. The distribution may 


be factorised into two parts, one containing only x and y and the other only and r, 

namely (except for constants) 

dF oc exp 

n ({x-nii)* (x 

. 2(r-p^)\'"al ‘ ^ 

- Wx)(g - ffl,) (y - 

OiOt C7.| 

|.J dx dg . (14.38) 

and 




dF oc exp j 

r n p* 2pr«i5, si) 

L oiJ 

^Jsx"”® «»’'“*(! — r®) 2' dsi 

dst dr . (14,39) 


Thus we see that in normal samples the distribution of means is entirely independent of 
that of variances and covariance. 

Before leaving (14.38) we may also note that the means are themselves distributed 

cr^ 

in the bivariate normal form, with mean (x) — nix, mean (y) = nij, var [x) — v&t{y) = -~ 
(all of which results are already familiar), and 

oov (x, g) ~ .(14.40) 

so that the correlation between x and g is p, the correlation in the parent population. 

14.14. We may now use (14.39) to obtain the distribution of the correlation co¬ 
efficient, namely, by integrating with respect to «x and s, from 0 to oo. Let us first of all 

evaluate the constant to be attached to it from the consideration that {dF =■ 1. 



DISTRIBUTION OF COVARIANCE IN NORMAL CASE 


S41 


Make the variate traneformatum 


n 


^' 2(1 -p*) 

r^iS, n 


c — 


orjff,'2(l -/)*) 
«| n 


<2(1-p*) 

We have for the Jacobian of the tranaforma^tion 

2si n 


(14.41) 


d(a, b, c) 
~S(8u 


of-2(1 -p«) 
rst n SiSi 


0 


rst 


n 


o,aj■ 2(1 — p®) <r,<Js■ 2(1 — p“) aiO,’ 2(1 — p*) 

n 


0 


s^sl 


0 


ai-2(l -p*) 


2acn 


and also the relation 


2<Tj<r|(l — p*)* <TiOj(l — p*) 


ac 


n-2 


The integral then becomeB 

je.p[ -„ + 2^ - 

6*\!9l (,,,7.(1 - p‘) 


(■ 


XI-.-- 
ar/ 


2acn 


da db do 


n~4 

2 Ja eZ6 Jc 


(14.42) 


= --^— J exp[ — a + 2/>6 - c] (ac - 6*) 

where the limits of a and c are 0 to oo and those of 6 are ± ^<*<5. This integral may be 
evaluated in terms of the f-function. Putting f = a-we find 

I 


exp (— exp 


( Ij2\ w- 4 rf-A 

+ 2pb ■— c — . ^ ^ d^dbdc, 0< f<cx:>, —oo<6<oo, 0<c<oo 


= j exp 1^- ^{(6 - pr)* + (1 - P*)C*}J^^* db do 

- ^(”2 j exp{- (1 - p»)c}c"-2' do 


« - 1 

(1 -p*)“2- 


's/n 

Tt r(H - 2) 

2»-3(i _ p2) ^ 


. (14.43) 



S43 


PRODUCT-MOMENT OOEREMTION 


OollectiQg up the terms from (14.42) and (14.43) vre find for the joint distribution of Si, 
«a and r ’ 


dF 


n"“^ 


rt— I 


(1 — p®) r(;n 2) 

Now put 


expf-+ 

2(1-p*)\ci a^a, 


ffiff, of 


w *-4 


«i*~® tf,”~®(l — r*)'*" dsi dst dr (14.44) 


C==l‘fi. r = r. 

OiOt QiS» 

We find, for the Jacobian of the transformation 


_3(C, ^r) 

3(«i. «», r) 


The exponent in (14.44) becomes 


O' jfT § O' iCf 2 

L _ i 

*1 

0 0 

_ 2 _ 

O,0r,‘ 


and after a little reduction the distribution becomes 

nn-l 


dF 


n(l V(w - 2) 


exp 1^— ^ ^~C(cosh z — dC dz{l — r*) 2 dr. 


On integration with respect to t w^e have 
dF 


(1 p^rr r{n - 1) (1 r*) 2’ 

71 r(n — 2) (cosh z — pr)^^^ 

Putting pr ^ cos 6 we have, since 

dz 0 


dz dr. 


j: 


n-X 


cosh z + 008 0 8in 0 


n _ /,2\ 2 (p 

dF -= (1 - r*) a ■ - - 

nr(n — 2) d(— coaO) 


-.f ® \ 

in-2 py 

P (1 -r*)”r — 

jiAro - 2)' dirp)”-* 1 v/f"I - pV*) 


5ir(ro - 2)'* ' ' ' d(rp)’*-2| v/(l - p* 

This is as simple a form as can be given in terms of elementary functions. 

14.15. In the particular case p = 0, (14.45) reduces to 


(14.45) 


df = 


n — 4 


„ (l_r*)-a~dr . 




. (14.46) 



DISTKIBtJTION OF COVARIANCE IN NORMAL CASE 


343 


a form surmieed by “ Student ” in 1908. This distribution provides a test of the hypothesis 
that an observed r arose from an unoorrelated normal population. Its distribution function 
may be obtained from incomplete R-functions, or more conveniently by putting 




y'(« — 2) . 


which transforms (14.46) to 
dF 


dt 




. (14.47) 

. (14.48) 


The integral of this function has been tabulated and is given as Appendix Table 3. 
Fisher and Yates have also tabulated some of the significance points of t and of r itself, 
i.e. the values of r (for various n) for which the distribution function takes specified values. 


14 . 16 . The general distribution (14.46) has been studied in some detail, but lack 
of space prevents the inclusion of the extensive analysis involved. We will here indicate 
only the more important features of the results. • 

First, as to the shape of the frequency curves. When n = 2 the distribution becomes 


dF = 


(1 -p«)co8-»(-pr ) 
jr(l — r®) ^/(l — pV®) 


and the frequency curve may be written 


8 

y ^“(1 - r*)8in(9 


0 = co8~i(“~ pr). 


. (14,49) 


For r = ± 1 the ordinate is infinite and the distribution will be found to be U-shaped. 
When n == 3 we find 

r 1 SCOSO] 1 rAX 

y Va 1 • 2 ” * a I /I • • • • (14,oO) 

l^Rin^o 8in®0J (1 — r*)* 

again a U-shaped distribution. For n = 4, 


y — — 3 cot 0 + 30 cot^ 0}.(14.51) 


If p = 0 this reduces to the rectangular form y - In other cases the curve is J-shaped, 
increasing from a minimum at r = — 1 to a maximum (but not an infinite maximum) 
at r = -f L 

For > 4 the frequency curves are unimodal and tend to normality with large n, 
though slowly. Some interesting photographs of models of these curves are given in the 
"‘Co-operative Study’' (1917). 


14 . 17 . The moments of the distribution are expressible in terms of hypergeometric 
functions. Returning to (14.44) let us write 



(1 p2) 


a|== 




n 


n 



844 


PBODUCT-MOMENT CORRELATION 


After a little rearrangement the distribution becomes 


dF = exp 

7ir(n - 2) ^ 


/_ 1 1 
\ 2^^ 2 




2oc| ■ " a.a.j 


n —2 / « \ n —2 


«!/ \«a, 


(1 _ y8)-2 




(14.62) 


Putting , tt, = — and expanding the term in exp {pr^—\ we have 

OCj (Xf y ^1^2/ 

!LrJL w~4 

(- l“i - - r*) * 

X . .(14.53) 

Integrating for «, from 0 to oo we find for the distribution of Ui and r 

dF = dri:^Pj^ T.2^. 

Multiplying by r and integrating from — 1 to + 1 we find 


__ n +2;-2 

and finally, integrating with respect to u, we obtain 

Substituting for the J5-function in terms of Afunctions and remembering that 


r{x)r(x +j)-^r{2.r). 


we find 


/“!(»■) = 


p(i-p*)-2 nil 


+ 1) 11 i(« + l)-^(» + 3) ' 2! ‘ 7 


p(i-p*)^r*r^ 

and since F{a, 0,y,x) — {1 — F{y — a, y — 0,y, x) 

or^d) 


^i(*') *= /„ _ jx n ^(i. i. J{« + l).p*) • 




. ( 14 . 65 ) 



DISTRIBUTION OF COVARIANCE IN NORMAL CASE 345 

In a similar way it may be shown that 

^ « 1 _ (1 _ p«) ^ ^ ^ ^ (14 5^j 

» - 1 

These series converge fairly rapidly for moderate or large n. 


14 . 18 . The ordinates and distribution function of the correlation coefficient are not 
expressible in terms of simple mathematical functions. They have, however, been tabulated 
by David (1938) for values of n = 2(1)26, 60, 100, 200 and 400 ; for /» == 0-0(0-l)0-9 ; and 
for r = — l-00(0-06) + l-OO, with finer intervals in places. 

For many practical purposes it is sufficient to use a transformation of the distribution 
due to Fisher (1921). Putting 


1 + n 

r *= tanh 2 , z == 4 log ,- 

1 — r 

p = tanh f, ^ = \ log 

1 — p J 


(14.67) 


we may expand the frequency function of r in powers of z — f, == a: say, and inverse powers 
of ». Fisher gives the following expansion: 


/ = 


n — 2 


y/{2n{n - 1)} 


I nx( ^ -I 

+ ^\mr=rT) + 48 "" 


+ ipa, + + 

ir + ( 


4 -p* 
8 


a;* + 


12 




4 + 12p* + 9p* 8 - 2p* + 3p* j 

128(n - 1)»“ 64(» - T) * 




Taking moments about a; = 0 we find, on transferring to the mean. 


Ml 

Mt 


2{n - 1) 
1 


1 4 - ^ 4 . 

^ 8(to - 1) ^ 


•} 


n 


1 


4 - p» 176 - 2V 

1 4 . -- 4 - 


21p< 


2(n - 1) 


48(» - 1)“ 


+ 


•} • 


„ _ P(P* “ tV) . 
- -(,r:ri)V + 


__ 1 f, , 224 - 48p* - 3p« 1472 - 228p* - 141p« - 3p« , ] 

^* “ („ _ " 16(71 - 1) ■ 32(w - 1)'*' + • • -j 


. (14.69) 
. (14.60) 
. (14.61) 
. (14.62) 


The remarkable thing about the transformation is that the distribution of r, which is very 
skew, becomes the distribution of z — C, which is nearly symmetrical. In fact, 


yi — (/>* — tV) +.(14.63) 

32 - 3p* 

V- as-— 

16(n - 1) 


+ . • « 


. (14.64) 



m PBOBITCT-MOMENT CORRELATION 


Thus w6 may take 2 — f to be approximately normally distributed with mean and 
variance given by (14.59) and (14.fl0). As a slightly rougher approximation we may take 




p 

2{n - 1) 


. (14.65) 


var (2 - f) = —i— 

n — 1 

which is approximately equal for small p to 


2 (» - 1 )* 


» _ 1 ^ _ 1)1 



approximately . 


. (14.66) 


When n is moderate we may take a still rougher approximation by assuming 2 — f to be 
normally distributed about zero mean with variance —r. Some comparisons of the 

71 — o 

various approximations are given in the introduction to David's tables, and it appears 
that for n> 50 the forms (14,65) and (14.66) are adequate. The approximation given 
by (14,59) and (14.60) appears to hold satisfactorily for values of n as low as 11. 


14.19. Except in the case of the normal parent very little exact knowledge is available 
about the sampling distribution of the correlation coefficient. There is, however, some 
empirical evidence to justify the use of the above results when the population does not 
differ very much from the normal. E. S. Pearson (1931), in dealing with some experi¬ 
mental results, concluded that “ the results suggest that the normal bivariate surface can be 
mutilated and distorted to a remarkable degree without affecting the frequency distribu¬ 
tion of r." The subject does not seem to have been investigated mathematically except 
in special cases. 


Example 14£ 


In Example 14.3 we obtained for the correlation coefficient between wheat and potatoes 
a value of 0*2182. Suppose we regard the 48 counties as giving a random sample of the 
yields of wheat and potatoes, either for a wider area or for an extended period of years. 
The question then is, can such a value have arisen by chance from a population in which 
the yields of wheat and potatoes are uncorrelated? 

From prior knowledge of crop yields we can assume with some confidence approximate 
normality in the parent population. Let us then test the hypothesis that the correlation 
in this population is zero. 

We have 


, , 1 + r , , 1*2189 
^ iZTr - ^ 0-7811 


0-2225 


c = 0 


1 

— 3) 


1 

V(45) 


= 0-1491. 


THie deviation 2 — f is thus 0-2225, or about 1-49 times the standard error, 
very improbable and the observed correlation may thus be accidental. 


This is not 



WSTEIB0TION OF REGRESSION COEFFICIENTS IN NORMAL SAMPLES 847 


JExcmptc 

In * sample of 50 a correlation coefficient is found to be + 0-5. What is the proba¬ 
bility that a value equal to or less than this should have been obtained from a normal 
population in which the correlation is + 0*7 ? 

The exact value, from David’s table, is, to five decimal places, 0*01289. Let us first 
of all take the approximation which assumes « — C to be distributed about zero mean with 

variance - — We have 
(n - 8) 

a = i log Li-- = 0-5493 
1 — r 

C = i log = 0*8673 

1 — P 

- .. - L -- , = 0*1469, 2 - C = - 0*3180. 

— 3) 

The deviation is thus 2*18 times the standard error, and the required probability, from 
the table of the normal integral, 0*0146 approximately, compared with the true value 
of 0*0129. The approximate test is not quite stringent enough. 

Let us then take — f to be distributed normally about mean ~ 0*00714. 

The deviation is then — 0*3251, or 2*23 times the standard error, giving a probability of 
about 0*0129, almost the exact value. 

Example 14.8 

In a sample of nj there is observed a correlation of Vi and in a second sample of n, 
a correlation of r,. Are the sample values and r, compatible with the hypothesis that 
the samples arose from the same population? 

Suppose the hypothesis were true, and that p is the correlation coefiBicient in the popula¬ 
tion. Then if 2 i == tanh“^ ri, = tanh“^ r„ C = tanh~^p, we know that if the population 

were normal, Zi — C will be distributed approximately with variance-and Zt C 

* Til — 3 

with variance —r^* Thus the difference Zj — Za = ( 2^1 — C) ~ (z, — C) is distributed 
71*2 — 6 

approximately normally with variance 

_L_ -f- __1_ 

ni - 3 - 3’ 

and this will provide a test of the hypothesis. 


Diaiributim of Regression Coefficients in Normal Samples 

rs» 

14.20. Turning again to equation (14.44) we have, substituting 6, = —, the joint 
frequency-distribution of s^, ,s^ and 6* 




34S 


PRODUCT-MOMENT CORRELATION 


Integration with respect to s, gives for the distribution of sand &t 

A further integration with respect to gives for the distribution of 6, 


dF oc 


or, on evaluation of the constant, 






The distribution of the regression coefficient bi is obtainable by interchanging the suffixes 
1 and 2. 

The form (14.68) is a Pearson Type VII distribution, symmetrical about the point 

6, = —, the population regression coefficient. It tends to normality fairly rapidly, and 

the use of the standard error for regressions is therefore valid for lower values of n than 
in the case of the correlation coefficient. For small samples, however, (14.68) is not of 
much use since it depends on the unknown quantities Oi, and p, i.e. the population variances 
and covariance. 


db. 


(1 -P*) +(6. 


5 )? 


. (14.68) 


14 . 21 . It is possible to find statistics other than 6j and 6, which will provide a test 
of the regressions. Write 

“ («| - 6|«f)*.^ - ^ 

We now return to the distribution of the quantities a, 6, c of equation (14.41), namely, 

. (14.70) 


w —4 


dF oc exp [ — a + 2pb — c] (ac — b^) 2 da db dc • 
We have from (14.69) 

b — pa 


u = 


V(ac - b^y 

and on substituting for c in (14.70) we have, after a little reduction, 


dF oc 


exp — da du 


au' 


n-l 


exp - ^^J(6 - pa^-^db. 


The integral of the second part on the right for 6 will be found to give a factor proportional to 

«P*(m* + 1) / cm^ 


exp 




and hence for the distribution of a and « we find 
dF oc 


!lrJ: 

a 2 exp (— a + p^a) da 


du 


(1 + U^) 2 


a 



DISTRIBUTION OF REGRESSION COEFFICIENTS IN NORMAL SAMPLES 349 


Hence the dutributiona of a and u are independent, and for that of v> we have 

du 


dF oc 


w—1 ' 

(1 + u*) T- 


Thi« distribution does not contain any of the parent parameters. If we put 

t-uV(n-2) -. . 


then t is distributed in “ Student’s ” form 


dF cc 


dt 




and may be tested accordingly. 


(14.71) 


(14.72) 

(14.73) 


Example 14.9 

In Example 14.3 we found for the regression of Y (potato yield) on X (wheat yield) 
(F - 6-005) = 0-0783 (X - 15-791). 


The regression coefficient is small. Could it have arisen from a population in which there 
is no correlation, i.e. in which = 0 ? 

From Example 14,3 we have 


6, = 0-0783 x/{n - 2) = 6-7823, = 4-1749, «| 


Hence from (14.72) 


btSi\/{n — 2 ) 

\/(4 ^ sfbi) 


= 2 06. 


05340. 


Appendix Table 3 does not carry xm as far as r = 46. From the Fisher-Yates tables, 
however, w^e have the following values of t for P = 0 05: 

r ^ 40 f = 2-021 ; V --- 60 t = 2-000, 

and for P — 0 02: 

V == 40 t 2-423 ; ^ V = 60 < = 2-390. 

Thus in our case P evidently lies between 0-05 and 0-02, and the regression may not be 
significant, i.e. the two variates may be independent. This confirms the conclusion 
reached in Example 14.6 from consideration of the correlation coefficient. 


14.22. Up to this point we have considered the correlation coefficient mainly as a 
measure of the relationship between two variates, and this is the standpoint which will 
mainly concern us in this and the succeeding chapter. We may, however, turn for a time 
to a consideration of the regression equations, which have an importance of their own. 
Assuming that the regression is approximately haear, we have two equations 


X^x^li,(Y ^y)\ 

Y HX - x)j • 


. (14.74) 


expressing the relations between the means of variate arrays and the variate-values deter¬ 
mining those arrays. A problem which frequently presents itself in practice is the following : 
given a member of the population exhibiting a variate-value x, what is its t/-value ? Evidently 
there is in general no unique answer to this question. For any given x there will be an 



350 


PRODUCT-MOMENT CORRELATION 


array of y\ any one of which might be exhibited by the member under conaidemtion. 
But in the absence of any special knowledge it is reasonable to take as the best estimate 
of y the mean of this array. If the population is normal the mean will be the modal value, 
and if it is approximately normal the mean will be a reasonable estimate, the greater part 
of the population values lying distributed within a range of two or three times the standard 
deviation of the array. 

in fact, the question as put is too restrictive. There is no unique value of y corre¬ 
sponding to a given x, and we are entitled to enquire only after the distribution of y*s or 
their principal characteristics. 

Now the mean required is given by the regression equation, and hence that equation 
may be used to estimate the y-value corresponding to a given x. If at the same time the 
variance of the y-array can be determined, the probable limits of error of the estimate 
may also be assigned. This is particularly easy for normal populations because, as we 
have seen (14.8), the variance of all x-arrays is of(l — /»*) and that of the y-arrays <r|(l — p*). 
As usual in large samples, we can use the sample values to calculate these variances; or 
we may take the variance of the array direct from observation. 


Example 14.10 

In Example 14.1 we found for the regression equations, in the units there employed, 

X - 0-7706 = - 1-417 (F -f 0-2095) 

F 4- 0-2095 = - 0-2658(X - 0-7706). 

Suppose we require to estimate the highest audible pitch for a man 34 years of age. In 
our units this corresponds to an r-value of J(34 — 22) = 4. Our estimate of y is then 

- 0-2095 - 0-2658(4 - 0-7706) 

= - 1-0679 units. 

This corresponds, in vibrations per second, to 

19,995 - (1-0679) X 2000 
= 17,900 vibrations. 

The variance of the estimate is «f(l — r*) 

= 13-3482(1 - (0-6136)*} thousands* 

= 8-322 thousands*, 

BO that the standard error is -v/6-322 = 2-9 units = 5-8 thousand vibrations. The estimate 
is evidently not very accurate, for the value of y can vary within two or three times this 
range without very great improbability. 

If the problem had been set in the reverse form : what is the age corresponding to 
a vibration of 17-9 thousands, we should have 

X = 0-7706 - 1-417(- 1-0679 + 0-2095) 

= 1-99 units 
= 27-98 years. 

This is not very close to 34 years, the age from which we started ; and in general, if | is 
the estimate of a:, given y = r],7] is not the estimate of y, given a- — We have a right to 
expect such a concordance only when r is near unity or when | and rj are near the means 
of the distribution, where the regression lines intersect. 



THE CORRELATION RATIOS 


S5I 


The Correlation Satioe 

For any bivariate distribution we have, if Xp is the mean of the pth a:-array 
and £ the mean of'the whole, 

Z{x — £)* = Z(z — Xp + £p — £)*. 

»= £{x — Xp)^ + l'{Xp — £)*, . . . (14.76) 


the product term 2I(x — Xp){£p — x) vanishing because Xp — x is constant for any given 
array. 

The correlation ratio of x on y, rj^, is defined by 


^ E{x — *)* ’ 

• 

. (14.76) 

and similarly that of y on x, rj^^, by 



i:(y-^)* • • • 

• 

. (14.77) 

Analogously to (14.75) we have 



Six - fiipY = - Xp +Xp — fiipY 



- E(x - XpY + ElXp - fiS*- . 

• 

. (14.78) 


But, from (14.11), with an origin at the mean. 


E{x - = (1 - p^)E{x - xY, 

and from (14.76), the result remaining true for an origin at the mean, 

- r)\y)S{x - xY E(x - XpY. 

Taking these results in conjunction with (14.78) we find 

2:{x - xYin'^xu - p®) ^Xp - ^tyY- 

Hence cannot be less than p. If and only if = p, fp — ^^y vanishes for each 
array, i.e. the regression is linear, may thus be used as an index of linearity of 

regression. 

Example 14.11 

The calculation of the correlation ratios is based on equation (14.76). As an illustration 
we will find those for the data of Table 14.1. The means of the horizontal arrays and the 
array frequencies are shown in Table 14.5. 


TABLE 14.5 


Calculation of the Correlation Ratio rjj^ for the Data of Table 14.1. 


Highest. 
AudibU^ Pitch 

Frequency 

Mean Xp 

Xp — X 

(Xp - xY 

5- 

3 

4-666,667 

3-896,025 

1.5-179,011 

7- 

45 

9-111,111 

8-340,469 

69-56.3,423 

9- 

10 

9-700,000 

8-929,368 

79-733,434 

Il¬ 

104 

8*817,308 

8-040,666 

64-748,834 

ia- 

93 

6333,333 

5-562,691 

30-943,631 

15- 

310 

3-022,581 

2-261,939 

5-071,229 

17- 

576 

1-064,236 

0-293,594 

0-086,197 

19- 

1051 

0-101,808 

- 0-668,834 

0-447,339 

21- 

957 

- 0-801,463 

- 1-672.105 

2-471.514 

23- 

165 

- 1-278,788 

- 2-049,430 

4-200,163 

25— 

41 

- 1-512,195 

- 2-282,837 

6-211,345 

27- 

16 

- 1-562,500 

- 2-333,142 

6-443,552 

29- 

2 

- 1-000,000 

- 1-770,642 

3-136,173 

31- 

2 

- 3-000,000 

- 3-770,642 

14 217,741 

33- 

4 

- 1-750,000 

- 2-620,642 

6-363,636 



862. PEODUCT-MOMENT C30REELATI0N 


We have already found that 

= 47,392 S{x) » 2604, 


from which 


From the table we now have 


= 45,386-25. 


S{Sp - »)* = 19,096-88. 


It should be noticed that in forming this smn we multiply each — »)* in the last column 
of Table 14.5 by the corresponding frequency in the second column, for the summation 
takes place over all values of x. 

We then find 


* = l?><J®'5-88 

46',385-26 


0-420,751, 


giving = 0-6487. Similarly it may be shown that rjyg. — 0-6231. The correlation 
coeflScient is — 0*6136. 

We have 

ri% -- r* - 0-044 
r]\x ^ 0 - 012 . 


These values are close to zero and the regressions are thus approximately linear. 


14.24. We shall see in the next chapter that rj* is closely related to a statistic JR, the 
multiple correlation coefficient, which is of rather greater importance. We accordingly 
defer a full discussion of the sampling distribution of ? 7 * until that chapter, but will here 
derive it in the special case of samples from an uncorrelated bivariate population. 

From (14,75) and (14.76) we have 


I ^ ri^ _Z{x - Xp)2 
x)^ 


. (14.79) 


Now if the population is normal and the arrays are of narrow width, the distribution 
in each array will be normal. We have already seen that in a normal distribution the 
mean is distributed independently of the variance. Hence E(x — f^)*, which is the sum 
of numerical multiples of array variances, is independent of the array means and hence of 
the quantity E{Xj^ — x)^. Thus the numerator and denominator of (14.79) are independent. 

Further, if the variates are uncorrelated and therefore (in the normal case) independent, 
the distributions in parent arrays have all the same mean and variance, those of the total 
distribution. Without loss of generality we may take the mean to be zero and the variance 
to be unity. 

It ivas seen in Example 10.6 that the sum of squares of a variates, each distributed 
normally with zero mean and unit variance, is given by 

dF oc dt .(14.80) 

and that the distribution of sum of squares about the mean is the same in form but has the 
index of t reduced by unity. Now E{x — x^,)® summed over any given array containing 
Np members is the sum of squares about the mean of Np variates and is thus distributed 
in the form (14.80) with — 1 degrees of freedom, th&t is to say with a = — 1. 



THE OOERELATION RATIOS 


353 


Thus the sum of (* ip)* for the Urhole array will be distributed in the form (14.80) with 
X(Np - 1) - p degrees of freedom, i.e. as 

** dF cc «HV-p-8) dt. . , . . . . (14.81) 


The moan ip will be distributed in the normal form 

dF oc ^'dx„ 


now ex- 


and consequently ^{ip — i)*, which is equal to — i)* (the summation 

» p 

tending over the p arrays), will be distributed in the form (14.80) with p — 1 degrees of 
froedom; i.e., writing u for the sum, as 

dF cce-*^u*<P-*>du. . . . .(14.82) 


To find the distribution of 


71^ t 

jj— we then have to find that of -, t and « being inde- 


7]^ U 

pendent. 

We have for the joint distribution 

dF oc exp [- at + «)] dtdu 

Put f = — f = < + 
u 

The Jacobian of the transformation is 

a(f. C) < + « 


(14.83) 


d{t, u) 


u* 


and (14.83) becomes 


A»(A'-p-2) 

dF a: c-K f»(A^-3) df df. 


(1 + f)K^-l) 

Thus I and C are independent and we have for the distribution of | 


ti(V-p-2) 

dF oc ^ 


(1 + 




(14.84) 


whence, on putting f == 


1 - 7 )* 


we find 


p-% 


dF oc (1 - r;*)»(^-P-2> d(»j*) 

^ “ ^2)KiV-p-2) (^2p-3) ^1(^2) 




(14.85) 


which is the distribution required. 


14.25. The distribution function of (14.86), which is a Pearson T 5 q)e I curve, may 
be obtained from the incomplete jB-function. It is sufficient for ordinary purposes, however, 
to use the tabulated forms of Fisher’s z-distribution (Example 10.18). In fact, putting 
in (14.85) 

= P - 1 

V, = AT — p 

«2. ^ ^_zJ^ 

1 —. ^2 p _ 1 

VOL. L A A 



354 

we find 


PRODUCT-MOMENT CORRELATION 


dF ec ^ . _ 

{v,e*» ^ Vt)**'!'*'*'*'* 

the form of equation (10.62). Appendix Tables 4 and 5 give the values of z, such that equal 
or greater values will be attaint with probability 0'05 and O’Ol. These tables are due 
to Pisher and reproduced from his Statistical Methods for Research Workers. In practice, 
however, jj* is only calculated for large values of N outside the range of these tables, and 
we may either use the approximation suggested therein or special Tables by T. L. Woo 
reproduced in Tables for Statisticians and Biom^cians, Part II. 


14.26. It is easy to show that the first two moments of (14.85) and the constants 
yi and y, are given by 

p ~ 1 


f*i 


yf = 


N-l 

2(p — l)iN —p) 

{N -1)*(N + 1)' 

8(N -2p + l)\N + 1) 


(p-l)(W-p)(N + 3)(W + 6) 
8 


Y* 


p - 1 
12 


(14.80) 

(14.87) 

(14.88) 


(P - 1){N -p){N + 3)» 

V = + jy«(4 - 5p) + N(5p* - 12p + 6) + 7p« - 7p + 1} 

_ i\/Ar _ ^\/Ar I o\/A7 I c\ • \ / 

Thus, to <M*der 

y? 


(14.90) 


and thus does not tend to normality for large N for any finite number of arrays p. 


Tetrachoric r 

14.27. We now proceed to consider two coefficients designed for •the measurement 
of dependence and b€isod on the product-moment correlation coefficient, tetrachoric r and 
biserial rj. Both those coefficients are, in effect, estimates of a putative product-moment 
correlation for data which are not specified with the detail of an ordinary bivariate table. 

Suppose we have a fourfold table 


a 

b 


c 

d 

c d 

a -f* 0 

b -^d 

N 


. (14.91) 


If this table is derived by a double dichotomy of a bivariate 


Zo exp 


{- 


1 /*» 
2(1 - p*)U 


UiCTf G\ 


D} 


frequency-distribution 





TETRACHORIC r 


355 


we may ask, what is the value of p in terms of a, b, c, d and N ? This problem is, in fact, 
determinate. 

If the populatiDn is normal the array totals will be normal, and thus the frequencies 
{a + e) and (6 + d) correspond to a dichotomy of the normal curve, i.e. there exists an 
h' such that 

a + 


LL 


z dx dy 


N 

b d 


or 


ftOQ AflO 

I I zdxdy 


> • 


(14.92) 


+ <5 


_ 1 r 

(TiV(2n)]k‘ 


exp 




N 
b + d 


Putting h — we have 
0^1 


£'exp(-M<fa_«L±^) 


(14.93) 


so that h can be derived from the tables of the normal integral. 

Similarly there will be a ifc defined by 

We then require to solve for p the equation 

» - £* [- 2(r^)<*‘ - ^ + »■)] • <'*■“> 

We will expand the integral on the right in ascending powers of p. The characteristic 
function of the distribution is 

4>{t, u) = exp {- + 2ptu + ii*)}. 

Thus 

i " s-l" ■'C '**'£. J- 

“ 4^*Ift M I ”*■ ~ dt du (14.95) 

h k « CO J. 

The coefficient of (— p)^ is the product of two integrals, the first of which is 

If" f" 

—J darj exp (— — itx)tf dt 

and the second a similar expression in 1;, y and u. Now from 6.24 the integral with respect 
to ( is equal to 



356 


PRODUCT-MOMENT CORRELATION 


and henoe the doable integral is 
Henoe, from (14.06), 

In the notation of 6.27 we write t for the tetrachoric function of h and t' for that of jfe, 
' and we then have 

dr ^ i 

.(14.96) 

The tetrachoric functions have been tabled up to tig (Tables for Statisticians and Bio¬ 
metricians, Parts I and II) and, with their aid, (14.96) can be solved by successive approxi¬ 
mation. Examples will be found in the introduction to the Tables. 

14 . 28 . It is to be realised that the coefficient obtained by the solution of 
equation (14.96) is not a product-moment correlation, but an estimate of the parameter 
p in a bivariate normal population. It is not an estimate of the product-moment correlation 
in non-normal populations. Its practical use is limited largely by arithmetical incon¬ 
venience, botjih in the solution of (14.96) and in the determination of sampling variances. 
Karl Pearson (1913) has given expressions for these quantities, but as nothing is known of 
the distribution of tetrachoric r it is not clear how far the use of a standard error is justifiable. 

Biserial rj 

14 . 29 . Suppose now that we have a 2 x g-fold table, the dichotomy being according 
to some qualitative factor and the other classification either to a numerical variate or to 
some variate permitting the arrangement of the classes in order. 

Table 14.6 will illustrate the type of material under discussion. The data relate to 

TABLE 14.6 


Showing 1426 Criminals classified Orccording to Alcoholism and Type of Crime. 
(C. Goring’s data, quoted by K. Pearson, 1909.) 



Arson. 

Rape. 

Violence. 

Stealing. 

Coining. 

Fraud. 

Totals. 

Alcoholic . • . . . 

60 

88 

155 

379 

18 

63 

753 

Non-alcoholic ... 

43 

1 

62 

110 

300 , 

1 

14 

144 

673 

Totals 

1 

93 

1 

150 

265 

679 

32 

1 

207 

1426 


1426 criminals classified according to whether they were alcoholic or not and according to 
the crime for which they were imprisoned. The order of the crime-classification is deter- 







BISEBIAL tj 




mined by its releXionship with intelligence, arson being associated with low intelligence and 
fraud with high. 

If the population is normal, rj ~ p. We have 

^ N var y 


t (Np fp* - g p y + Np g») 
N var y 


Since 

Thus 


= — 
\N var yj 

\N var y) 


var y 


(14.97) 


y 

N v&ty 


2 N„ij„ = 


var y 


1 2: 

N var yp ' var y var y 


(14,98) 


But the mean variance of arrays, weighted according to the numbers in arrays, 
= var y(l — p^) — var y(l — rj^). Taking this as equal to var yp we have 


T 




\N var yp) 


giving 


A 27 (—t! ^ 

N Vvar yp) 


Vv) 
var y 


var y 


1 + 


:(^.pAp^) 

\var Vpj 


. (14.99) 


The use of this expression lies in the fact that the quantities in it can bo estimated 
from the data on certain assumptions. If we suppose that the quantity according to which 
dichotomy has been made (in our example, alcoholism) is capable of representation by 
a variate* which is normally distributed, and thus that each y-array is a dichotomy of 

y y 

a normal curve, the quantities — and - 7 ™^— can be obtained from the tables of 

V var y -y/var 

the normal integral. For example, the two frequencies alcoholic and non-alcoholic are, 

50 


93 


0*5376 


for arson, 50 and 43. Thus the proportional frequency in the alcoholic group is 
and the deviation corresponding to this frequency is seen from the tables to be 0*0946, 


which is thus 


Vvary^ 


for this y-array. 


Example 14J2 


For the data of Table 14.6 the proportional frequencies, the values of 


the Np are as follows 




Arson. 

Rape, 

Violence. 

Stealing. 

Coining. 

Fraud. 

Total, 

Alcoholic . 

* • 

. 0-5376 

0-5867 

0*5849 

0-5582 

0*5625 

0-3043 

0-5281 

ft./Vvar y. 

• 

. 0-0944 

0-2190 

0-2144 

0-1463 

0*1573 

- 0-6119 

0*0704 

N, . . 

• 

93 

150 

266 

679 

32 

207 

1426 



358 PRODUCT-MOMENT CORRELATION 

Then from (14,98) we have 

-^^{93(0-0944)» (0-0704)» 

« 1426 

,1 = __—--- 

1 + + . . . } 

giving rj* — 0'05456 

n = 0-234, 

which, on our various assumptions, may be taken as approximating to the supposed product- 
moment correlation coefficient. 

As for tetrachoric r, the sampling distribution of biserial rj is unknown. Expressions 
for its sampling variance have been derived by K. Pearson (1916), but are to be used with 
considerable reserve, 

14 . 30 . Something may also be said about the assumptions on which tetrachoric r and 
biserial q are based, particularly that of normality. In supposing that a given fourfold 
table is the double dichotomy of a normal population, we are assuming that the attributes 
or variates concerned are capable of representation on a normal scale and that it was, in 
fact, this scale which determined the classification given. This assumption is evidently 
a considerable one and cannot always be made with much confidence. In dividing criminals 
into alcoholic and non-alcoholic it would, for example, be assumed that. “ alcohoUsm ” 
is a quantity which varies continuously from one subject to another; or perhaps that 
propensity to alcoholism was such a variate. At one end of the scale we should have 
chronic inebriety, at the other the most austere teetotalism. It would be further assumed 
that if the degree of alcoholism could be measured, the population of criminals would be 
distributed according to the alcoholic variate in a normal form; and it would be further 
assumed that the data which are given would have been arrived at by a dichotomy of the 
population according to the variate. How far assumptions of this kind are justified depends 
on previous knowledge and the circumstances of individual cases ; but even so it remains 
largely a matter of personal opinion. The reader will meet widely divergent views in the 
literature of the subject. 

Intra-clasa Correlation 

14 . 31 . There sometimes arise, mainly in biological work, cases in which we require 
rite correlation between members of one or more families. We might, for example, wish 
to examine the correlation between heights of brothers. The question then arises, which 
is the first variate and which the second ? In the simplest case we might have a number 
of families each containing two brothers. Our correlation table has two variates, both 
height, but in order to complete it we must decide which brother is to be related to which 
variate. One way of doing so would be to take the elder brother first, or the taller brother ; 
but this would provide the answer to different questions, the correlation between elder and 
younger brothers, or between taller and shorter brothers; not the correlation between 
brothers in general. 

The problem is met by entering in the correlation table both possible pairs, i.e. those 
obtained by taking both brothers first. Generally, if the family contains k members, there 
win be h{k — 1) entries, each member being taken first in association with each other 



INTBA-CLASS CORRELATION 


359 


ntember second. If there are jp families with ki, kt . . . members there will be 
y^kfjkf — 1) entries in the correlation table. As a simple illustration consider five families 
of three brothers with heights 

69, 70, 72 inches 


let family 
2nd family 
3rd family 
4th family 
5th family 


70, 71, 72 „ 

71, 72, 72 „ 

68, 70, 70 „ 

71, 72, 73 „ 


There will be 30 entries in the table, which will be as follows 


Height (inches). 



68 

09 

70 

71 

72 

73 

Totals. 

68 

— 


2 


— 

— 

2 

69 

_ 

— 

1 


1 

— 

2 

70 

2 

* 

2 

1 

2 

— 

8 

71 

— 

— i 

1 i 

] 

4 

1 

6 

72 

— i 

1 

2 

4 

2 

1 

10 

73 

— 

— 

— 

1 

1 

— 

2 

Totai^ 

2 

2 

8 

6 

10 

2 

1 

30 


Here, for example, the pair 69, 70 in the first family is entered as (69, 70) and (70, 69) 
and the pair 72, 72 in the third family twice as (72, 72). 

The table is symmetrical about the diagonal, as it evidently must be. We may calculate 
the product-moment coefficient in the usual way. We find var a; = var y = 1'716, 

oov (xy) — 0'616 and hence p = = 0-301. 

The actual compilation of such a table is, however, both tedious and unnecessary. 
The coefficient p can be found by direct methods, as follows:— 

Suppose there are p families, with variate-values aru . . . Xik^, x^i . . . x^k,, . . . 
Xpi . . . Xpkf, the families numbering ki, kt ... kp. In the correlation table each member 
of the tth family will appear kf — 1 times (in association with the other members of the 
family), and thus the mean of each variate is given by 

« =S = ir{(fc,-l)r(ar«)}, 

AT I ^ 


. (14.100) 








S60 


PRODUCT-MOMENT CORRELATION 


the first summation taking place over the p families and the second over all members of 
the ith family. Similarly 

var X = var y = — «)*} . . . (14.101) 


and 


oov {xy) - jsS n (Xij - *)(*« - x), j ^ I, 

i i,l 


. (14.102) 


the summation S extending over all possible pairs for which j ^ 1. Thus the coefficient p is 
given by 

2I{Xtf-x)(Xii-x) 

/»— ^ _ _ • • • • • ( 14 . 103 ) 

r(i*-l)r(a:,,-f)* 

t I 

This can be thrown into a rather more convenient form. We have 

£ £ {Xif — x){Xii — x) ^ £ £' (Xf, — x)(Xfi — £)—££ (% — 

i f. I i hi i 1 

(where the summation iT now extends over all possible pairs, including j = 1) 
fl 

= £ k%£i — x)* — ££ {Xff — f)*, 
t i ] 

if being the mean of the tth family. 

I^us 

£ kf(Xf — x)* — £ £{Xfj — x)* 

p = 


±> 


(14.104) 


(14.105) 


£{kf — 1)2' (xy — x)* 
i ) 

If all the families have the same number of members this formula is somewhat simplified. 
Denoting by v the variance of x, and by the variance of means of families (about the 
mean x), we have 

pkh}„ —pkv 


P ==: 


(k — I )pkv 

1 

(k — 1)\ V 


1 


. (14.106) 


The coefficient p is called the Intra-class Correlation Coefficient, to distinguish it fi:om 
the ordinary product-moment coefficient. 


Examph 14.13 

Let us use formula (14.106) to find the intra-class coefficient for the example of the 
above section. With a working mean at 70 inches, the values of the variates are 
- 1, 0, 2 ; 0, 1, 2 ; 1, 2, 2 ; - 2, 0, 0 ; 1, 2, 3. 

Hence x = = 1 {(- 1)* -t- 0* -f- . . . } - 


The means of families are 


var (x) = 


386 

m’ 


6 15 26 - 10' 30 



INTRA-CJLASS CORRELATION 


361 


and the deviations from S 

- 8 2 12 

-23 


15 ’ 15’ 16’ 

16 ’ 

Ihus 

m 

II 

+ etc. 


1030 



1125 


Hence, from (14.106) 




} 


17 

16' 


irs.1030.225 


^ 2\ 1125.386 

= 0-301, 

a result we have already found directly. 


■} 


. 14.32. One caution is necessary in the interpretation of the intra-class correlation 

coefficient. From (14,106) it is seen that intra-class p cannot be less than r —though 

it may attain -f !• It is thus a skew coefficient in the sense that, unlike product-moment 
correlation and association, a negative value has not the same significance (as a departure 
from independence) as the equivalent positive value. 

14.33. The sampling distribution of intra-class p for the case of a normal population 
and equal numbers in families may be obtained as follows :— 

It may be shown, precisely as in (14.25), that the ratio of two sums of squares about 

means, I = —, based on N — p and p — 1 sums, is distributed as 


Vt 


dF oc 


|J(V-p-2) 


(14.107) 


provided that the sums are independent and emanate from normal populations. Here 
alf oi are the population variances relating to v*, Vi respectively. 

Ckmsider now p families of k members, pk in all, as p samples of k from a normal popula¬ 
tion in which the intra-class coefficient is A. Writing I for the sample intra-class coefficient 
we have 


I 


k-l\v } 

. 

where | =-5*. Now relates to means of samples and is distributed independently 

tn 

of V — v^, as in the case of (14.79). We may therefore substitute for | in (14.107), with 
N =pk and p — p. Furthermore, since the population value of v — is of and that of 


V- IS 


ui 

h-l’ 


we have 




kal 


(k — l)of -f-trl 




(k — 1) of -f- d% 


. (14.109) 



862 


PRODUCT-MOMENT CORRELATION 


After a little xediiotion (14.107) becomes 

. . .„ 4 .uo) 

{1 - A + A(jb - 1)(1 - l)}”r 

As for the product-moment coefficient, this form may be brought closer to normality by 
putting 

I = tanh z, A = tanh C> 

In the particular case I; « 2 we find 

e~**8eoh*’~*3 d* 

dF QC-r—-j;-^ 

co8h»’“*(* — C) 

which has the remarkable property of depending only on s — C* i-e. of being the same in 
form for any f or A. Writing * — f t= « we may derive firom (14.111) the expansion 

^ r{n-lW{ 2 n) L 12 

f»-l . (»-l)*-nri 1 ** t , 17a^ 1 /,4,,nv 

- {-ir** - S88"*’}JL' -^*-8+48+W-J' 


givmg 


fii = - 


2(« - 1)1 ' 2(w - 1) 


• n - 1\ • 2(w - 1) ' 6(w - 1)* 

1 

“ (n _ 1)» . 


(» - 1) > 


^ - 1 4(w - 1)* ' 


whence 


{n - l)i 

2 

y, = --- + 


. (14.113) 


. (14.114) 
. (14.115) 
, (14.116) 
. (14.117) 


. (14.118) 


n - 1 (ft - 1)“ • • -/ 

illustrating the tendency to normality. 2 — f may be taken to be distributed normally 

about zero mean with variance ;;-- approximately. 

(w -1) 

For the general case the substitution 

2{k — 1)1 = k — 2 + k tanh (z — 0)1 

I • • 

I 


2(ib — 1)A = fc — 2 + Jfc tanh (C — 0) 


k — 2 

where tanh 0 = — r —» reduces (14.110) to 

tC 


. (14.119) 


t^) 


2~r{{k - l)p 




Kexp{- 


(i-2)p + l 


-.)} 


X sech —^{x ~-Q)dx , 

m 


. (14.120) 


where, as usual, * = z — f. 



NOTES AND BE7EBENCES 


363 


NOTES AND EEFERENCES 

i 

The olassiloal theory of product-moment correlation, beginning with Galton and Karl 
Pearson, was established by Yule (1897, 1907). The sampling problem for the normal case 
was solved by Fisher (1915) and studied by subsequent writers, culminating in Miss David’s 
tables of 1938. For experimental work on the sampling distribution see E. S. Pearson 
(1931). A method of deriving the distribution alternative to the geometrical approach and 
relying on characteristic functions has been given by Kullback (1936). 

Tetraohoric p and biserial rj are both inventions of Karl Pearson’s, but the tetrachorio 
series has been discovered by many writers, priority apparently being due to Mehler (1876). 
For controversy on the nature and scope of tetraohoric p see references in previous chapter. 

Intra-class correlation is formally equivalent to a linear function of the ratio of two 
variances and thus becomes a branch of quadratic analysis (analysis of variance) which wiU 
be dealt with in the second volume. 

Co-operative Study (1917) (H. E. Soper, A. W. Young, B. M. Cave, A. Lee and K. Pearson), 
“ On the distribution of the correlation coefficient in small samples,” Biometrika, 
11 , 328. 

David, F. N. (1938), Tables of the Correlation Coefficient, Cambridge University Press. 
Fisher, R. A. (1916), “ The frequency distribution of the values of the correlation coefficient 
in samples from an indefinitely large population,” Biometrika, 10, 507. 

- (1921), “ On the probable error of a coefficient of correlation deduced from a small 

sample,” Metron, 1, No. 4, 3. 

Kullback, S. (1934), “ An application of characteristic functions to the distribution problem 
of statistics,” Ann. Math. Statist., 5, 263. 

Mehler, G. (1876), “ Reihenentwicklung nach Laplaceschen Functionen hoherer Ordnimg,” 
J. far Math., 66, 161. 

Pearson, K. (1909), ” On a new method of determining correlation between a measured 
characteristic A and a character B, etc.,” Biometrika, 7, 96. 

- (1910), “ On a new method of determining correlation when one variable is given by 

alternative and the other by multiple categories,” Biometrika, 7, 248. 

- (1911), ” On the correction necessary for the correlation ratio rj,” Biometrika, 8, 254, 

and (1923), 14, 412. 

- (1913), “ On the probable error of a coefficient of correlation as found from a fourfold 

table,” Biometrika, 9, 22. 

- (1915), “On the probable error of biserial r/,” Biometrika, 11, 292. 

- and Pearson, E. S. (1922), “ On polychoric coefficients of correlation,” Biometrika, 

14, 127. 

Pearson, E. S. (1931), “ The test of significance for the correlation coefficient,” Jour. Amer. 
Statist. Ass., 26, 128, and (1932) 27, 424. See also Cheshire, L., Oldis, E., and 
Pearson, E. S. (1932), ibid., 27, 121. 

Ritchie-Scott, A. (1918), “ The correlation coefficient of a polychoric table,” Biometrika, 
12, 93. 

Yule, G. U. (1897), “ On the theory of correlation,” Jour. Roy. Statist. Soc., 60, 812. 

- (1907), “ On the theory of correlation for any number of variables treated by a new 

system of notation,” Proc. Roy. Soc., A, 79. 182. 



364 


PRODUCT-MOMENT CORRELATION 


EXERCISES 

14.1. Show that the data of Table 1.26 have the following constants (« » age, 
y = milk yield): 

mean x = 6*22 years. mean y = 18-Cl gallons. 

V(var x) — 2-21 „ ^(var y) =* 3-37 „ 

p = 0'219, rjjcy — 0-242, = 0*266. 


14.2. Show that for the data of Table 14.2 

p = — 0*014, = 0*14, rjyj. 


0*38. 


14.3. Show that the smaller angle between the regression lines is 

1 — P* OjOr, 


arc tan 


p orf + of 


. 2 * 


14.4. If a bivariate normal surface is dichotomised at its medians and a is the pro¬ 
portional fiequency in the positive compartment of the 2x2 table so generated (i.e. the 
compartment including the limits -f oo), show that 

p = cos (1 — 2a.)7i. 

(Sheppard, Phil. Trans. Boy. Soc., 1898, 192A, 101.) 

14.5. Show that the ordinates of the sampling distribution of the correlation coefficient 
r in samples from a normal parent with correlation p obey the reciurence relation 

2n — 1 n — 1 

yn+a — ^a/n+l + 

where n is the sample number and 

_ _ pr V(1 - P*) V(1 -r^) „ _ (1 - P“)(l - r*) 

r=^pV ’ -1 ~p-^^ ~ • 

(Co»operative Study, 1917.) 


j _ 

14.6. By the transformation cosh z — pr — -—— show that the ordinate of the 

1 — u 

distribution of r may be expressed as 

_ » - 2 (1 - p*)“i~ (1 - r*)"r r{n - 1) 

*'• V(2>.) n”-i) 


f 1* a , U.3* 

\ 2*«-i • 2'*.2'*'(n-J)(n + i) 

where a — . 


1*.3“..'>* a* 

2^2*.2“ (»r^i)(¥Ti)^+l) " 


14.7. Show that the characteristic function of 

fl - e 

^ 2(1 - p*)£r?’ * (1 - p*)a,«T,’ 




2(1 - pV* 



EXERCISES 


366 


in normal samples is 

' _(Lr _ 

{(1 - i<,)(i - it,) -p^{i + »<.)*}»’ 

where ti refers to 9i, and so on. Hence show that the distribution of variances and co- 
variances has the same characteristic function, except for constants, but with the value 
of n reduced by unity. Show that the simultaneous distribution of these quantities is then 
that of equation (14.42) with 0i = na, 6, = npb, 6, — nc. 

(Kullback, 1934.) 


14.8. From the distribution of equation (14.42) show that the distribution of 

3 IS 

V = ._?/_? and r is given by 

OTi/Cft 


dF oc 


n--4 

yn-2(l _ 


Integrating for r from 


(1 — 2prv + v*)““i 
1 to + 1 by putting 


drdv. 


r — 


m(A + ju) + (1 
show that the distribution of v is 


u{X + fi) ~[1 - u)(X - fi) 


dF = 2(1-p i^)'^' f, _ 4pV 

sf- (1 + (1 

^,""2 ’ r) 


(This gives the distribution of the variance ratio when the variates are correlated. 
The result is due to S. S. Bose (1935), Sankkydy 2, 65. The derivation was given by 
Finney, Biometrika, 1938, 30, 190.) 


14.9. Show that in samples from a normal bivariate population the variance of 6* is 
given exactly by 

and that for the distribution of 6* 

yi = 0 

6 

yt == — 

n — 5 

14.10. By considering the joint distribution of and 6, in normal samples, show that 
the regression of on is linear, but that of on 6, is not linear and does not tend to 
linearity for largo samples. 


14.11. Writing the bivariate frequency function in the form 

fix, y) = /(-c) 

BO that the jth moment about the origin of the y array for given z is 



PRODUCT-MOMENT CORRELATION 

show that 

(where <fi is the obaraetwistic function of the distribution) so that 

Verify that the bivariate normal distribution has linear r^ressions and is homosoedaslio. 

14.12. (Data of E. M. Elderton, quoted by K. Pearson, 1910.) The following table 
shows 811 sons classified according to alcoholism of parent and health of son:— 


Son. 



Healthy. 

Fairly 

healthy. 

Delicate. 

Phthisical or 
epileptic. 

Died 

young. 

Totals. 

Alcoholic . 

122 

0 

24 

B 

42 

205 

Non-alcoholic 

328 

37 

71 

1 

37 

i 

133 

600 

Totai^s 

460 

46 

1 96 

45 

176 

! 

811 


Show that biserial rj == 0*089, indicating little correlation between health of son and 
consumption of alcohol by parent. 


14.13, (Data from 0. H. Latter, Biometrika, 4, 1905, p. 363.) 

The following table shows the length of cuckoos’ eggs fostered by various birds :— 


Length of Egg (units ^ inillime^tre). 


Foster Parent. 

... 

40 

41 

42 

43 

44 

46 

46 

47 

48 49 

50 

TOTAIiS. 

Bobin. 

1 

1 

8 

3 

9 

13 

20 

6 

11 ^ 2 

2 

76 

Wren. 

7 

6 

14 

! 

8 

1 

9 

1 

6 

3 

2 

; 

! 

1 

! 

54 

Hedge-Sparrow 

— 

— 

2 

" 

6 

14 

1 

13 

3 

1 

6 — 

I 

3 

68 

Totals 

8 

6 

24 

16 

32 

32 

30 

11 


6 j 

188 


Show that the coefficient of intra-class correlation is + 0-22, 


14,14, A series of measurements are subject to errors of observation which may be 
supposed uncorrelated with the magnitudes of the measurements. If Xi, yx refer to the 







EXERCISES 


S67 


observed deviations from arithmetic means and z, y to the true deviations, show that 
HiZiVi) *= ^(zy), but that var *i > var x ; var yi > var y. Hence show that the observed 
corr^tion is less than the true correlation. 


14.15. If three variables Xu X„ X» are uncorrelated and the deviatioi^s are small 
compared with their mean values Mi, and if,, show that the vsbriauce of is 
approximately 

iff/var Xi 2 cov (X,, X,) , var X,^ 


iffV M\ 


X X 

and that the correlation between ~ and ~ is 


X, 
P = 


MiM, 

X. 


+ 




where ef 


var Xi 

~mJ 


v'(vf + + w|) 


, etc. 


Note that this is positive, so that there is a “ spurious ” correlation between the two 
indices and 



CHAPTER 15 


PARTIAL AND MULTIPLE CORRELATION 

15.1 • The produo<>-moment coefficient of correlation can, as has been seen in the 
last chapter, be used to measure the relationship between two variates which are distributed 
either exactly or approximately in the normal form. When we come to interpret such 
a correlation, however, we meet the same sort of problem which arose in Chapter 13 in 
connection with associations : if a variate 1 is correlated with a variate 2, may this not 
be due to the fact that both are correlated with a variate 3 ? The question may be decided 
by considering the correlation of 1 and 2 in the sub-populations for which variate 3 is 
constant, and in this chapter we consider the theory of such partial correlations, which 
bear an obvious analogy to the partial associations of Chapter 13. The subject may best 
be broached by extending to several variables the theory of linear regression developed 
for two variables in the previous chapter. 


15.2. Suppose, in fact, that there is given a set of N individuals considered according 
to p variates . . . Xp, so that to each individual there correspond p variate-values. 

We may, for example, be given a set of men according to height, weight, age and income, 
or a set of counties according to wheat-yields, hours of sunshine per annum, inches of 
rainfall pc^ annum, and mean height above sea-level. In general, any variate may be 
considered as dependent on the others and for any variate, say Xi, we may require to find 
the best ” linear relation of the form 


Xi = a -f PzXt -f ^3X3 + • . • .... (15.1) 

a generalisation of (14.8). As before, the constants may be determined by the principle 
of least squares, i.e. so that 

U = lix^ - a - - ... - PpXp)^ .... (15.2) 

is a minimum, the summation extending over the N members of the population. We 
shall then have 

- = I{x, - « - _ ... - == 0, . . . (15.3) 


and if we take the variables measured from their means, this reduces to a = 0. With 

dJj 

this convention we have (p — 1) equations of type 5— == 0, i.e. 

^Ph 


IXkipCt - PtXt 


= 0 


or 


cov (a^fc, Zi) — fit cov (ajfc, a;,) . . . - /?* var a-* - ... — fi^ cov = 0, 

k ==2,3, ... p. . . (15.4) 

These (p — 1) equations can be solved for the (p — 1) quantities fi and hence the required 
form (16.1) is determinate. 


15.3. In the notation introduced by Yule we write 

= ^12.34...p-^1 + ^13.8*...p+ • • • + Ap.23..<p-1)-^pj 

368 


. (16.6) 



PARTIAL AND MULTIPLE CORRELATION 


869 


wMoh IS the i'©gi?essioii equation of Xi on Xf • . . referred to the means of the variates. 
The quantities fi are called Partial Regression CoeflBcients, The first subscript to the left 
of the period in each is that of the variate on the left of the regression equation, and 
the second subscript is that of the variate to which it is attachei^. These are called Primary 
fiiubecripts. The subscripts on the right of the period are those of the remaining variables 
and are called Secondaiy Subscripts, 

When no confusion is Ukely to arise we can write (16,6) in the simpler form 

Xi =« ^jXj -4“ • • • "f" ^pXp, • • • • • (16,6) 

that is to say, we may drop the first primfi^ry and the secondary subscripts. 

The order of the primary subscripts is material, ^i 2 ,ic being different :^m P 2 i,k 5 but 
that of the secondary subscripts is not. 

Write 

^1.23...p • • (16.7) 

*ben be called the residual of Xiof order p. It is the difference between 
the observed Xi and the value given by the regression equation. If all the residuals are 
zero, and only in this case, the regression is exactly linear. The /8’s were determined so 
as to make the sum of squares of residuals a minimum. 

, Write also 

var (^i. 23 . 4 .p) ^ ^ 1 . 23 ...p • • • ♦ • (16.8) 

so that <71,23. .p is the standard deviation of residuals and corresponds to the standard 
deviation of arrays considered in 14.22, 

15.4. From (15,7), equations (15.4) may be written 

= 0, *==2 ...p . . . . (15.9) 

and generally we shall have 

^y.i2..(/-i)<7>i)...p) ~ . . . (15.10) 

i.e. the covariance of any residual and any variate is zero, provided that the subscript of 
the latter occurs among the secondary subscripts of the former. 

More generally still, 

--(^1.34..,p *'*^2.34,. p) -^{^1.34...P /^23.4...p 

and each term on the right vanishes in virtue of (15.9) except the first, so that 

‘^(^1.34...p ^2.34...p) ^ -^(^1.34.. .p ^a) • • • (15.11) 

~ -^(^1 ^2.34....p) • • • (15.12) 

by symmetry. 

Thus the covariance of any two residuals is unaltered by omitting any or all of the 
secondaiy subscripts of either which are common to both. Conversely the covariance of 
any residual with p secondary suffixes and a residual with those p secondary suffixes and 
q additional ones is unaltered by adding to the former any of the q of the latter. 

As a corollary, any covariance is zero if all the subscripts of one residual occur among 
the secondary subscripts of the second. 

15.5* In virtue of these results we have 

*=== -^{^2.34...p (^1 ^12.34...p • • •)} 

8= -^{^2.34...p ^l) /^12,34...p ^(^2.34...p 

« -^(^2.34...p ^*^1.34...p) ^12.34...p -^(^2.34..,p)^ 

il*3."~~VOL. I. u n 



370 


PARTIAL AND MULTIPLE CORRELATION 


(16.13) 


(16.14) 


and thus, writing q for the group of suffixes 34 , . . p, we bare 

^ _ oov (®i.„, Xi,^) 

Pta.g “ — y —r~^ «... 

var (*s.,) 

a generalisation of (14.6). 

Similarly 

) ... 

We may then define a coefficient pi^.g = Pia.s* ...p hy the equation 

Pli.Q (fiu.q §n.g)* 

= g» ^a.g) 

{var(a:8.j) var(a;x.J}* 

This is a generalisation of (14.10). is evidently the product-moment coefficient of 
correlation between a:,., and 


(15.15) 


16.6. From p variates we can pick out two in ways and find the regression of 

each on the other and their Correlation; we can also pick out three in ways and find 

the regression of each on the other two ; and so on. The number of possible regressions 
and correlations is thus very large, but they can all be expressed in terms of the variances 
of the variates and the correlations between pairs. 

We shall call the coefficients with k secondary subscripts regressions, Correlations, etc., 
of the itth order. The correlation between a pair of variates ft* is thus of zero order, and 
our result may be stated in the form that coefficients of any order are e.xpressible in terms 
of those of zero order. The proof follows from the expressions which we proceed to derive, 
giving coefficients of any order in terms of those of lower order. We have 

.83...J») = '^(®1.83...(p-l) ^1.23...j)) 

-^{«i.23...(p-i) (*» - ^Ip.i 3 ...( 3 >-iy^p - terms in x, to 

= ^(*1.23...(p_l)) — ^lp.23...(p-l)^(^Pl.23...(|)-l) *p.2S...(p-l)) ! 

hence, dividing by N, 

var ( 3 'i.a 3 ...p) = var (*i.a 3 ...(j,_x)) — ^ip.23...(p-i) ^pi.23...(p-i) v®'*’ (^ 1 . 23 ...(p- d) 

= Vax(Xi.23...(p_i))(l -p;V33...(p-i)). 

which may be regarded as a generalisation of (14.11). By continuing the process we have 

var (*1.23...p) = var(x,)(l -/)?2)(1 -ph.sXl - Pu.sz) • • . 

(1 -/>?p.23...{p-l)) • .(15.17) 

or = (1 — pi 2 )(l — ^ 13 . 2 ) ... (1 — Pip.23...(p-1)) • > (15.18) 

The subscripts of the p’e can be eliminated in a different order, giving alternative forms 
such as 

" = (1 — pi3)(l — pf4.8)(l — Pis. 34 ) • . ®tC. 

Thus the variance of a residual of order p — 1 is expressible in variances of zero order and 
correlations of order p -- 2. 



PARTIAL AND MULTIPLE CORRELATION 


371 


0 

0 


1'5.7. Equation (15.4) may be -writton 

Pl2 ~ ^12.34...p ®^2 ~ PlS.i...p p22 <^2^3 ~ • • 

P 1 SO 1 O 3 — ^l2.3i...pP23 ^2^3 ~ ^1.3.2...p <^S ~ • • 

etc. Adding the expression for i.e. 

Ol —■ 1.23...p — ^ 12 . 34 ...pPl2 ®'l ®2 ~ ^13.2...p Pis — • • • = 0 

we have p equations from which, on elimination of the there results 


2 2 

<^1 — <^1.23...p 

pi% o^iO^a 

Pia 

• 

• 

• 


^*1 

c\ 

P*3 O’aO^s 

. 

. 



pal O'a^^i 

Paa O’aO'i 


• 

• 

• i 


« 

• 

• 

. 

. 

. 



Write 


:fc = 

Pm- 

Dividing the ith 

row by cr^ and the ith column by we get 

«1.23...p 

o\ 

pit 

Pit 

• • • Pip 



pit 

1 

Ptt 

• • • P2p 

= 0 . 

. (16.19) 

Pip 

P2p 

PZp 

1 




! 

- Pl2 

Pit 

• • • Pip 



1 :=■ 

1 Pit 

1 

• 

Pt» 

• 

• * • Pi»p 

♦ • • » 

• • 

. (15.20) 


Pip 

P2p 

P3p 

1 




and toji for the minor of the first row and column of this determinant. Then from (15.19)- 


<^1.23. ..p,, A 

(o- ~Ji(On — 0 


<^1.23...p 


offt) 

Ojj* 


Generally it may be shown in exactly the same way that 

cov 

, /+1, ...p I, ni + 1, ..p) 


. (15.21) 


. (15.22) 


^Im 

where (Oi^ is the minor of the Zth row and mth column in (15.20). 

This result shows that the variances and covariances of residuals of any order can 
be expressed in terms of the correlations and variances of zero order. 


15.8. We have, as in (15.16), 

^(^1.34...p‘*^2.34...p) — -^(*1.34...(p-l) ^2.34...(p-l)) ■“ ^2p.34...(p-1)-^(^l.Sl...(p-1) *^p.S4...(p-1))- 

Substituting 

ft _ fl <^1.34...(p-l) 

P2p.34...(p-1) — Pp2.34...(p-1) —-j- 

O^p .34...(p-l) 

and expressions for the covariances in terms of variances and regressions, and writing 
3 for the group of secondary suflixes 34 ... (p — 1), we find 

^n.gp ~ ^12.9 *^2.9 ~ ^lp.9 ^p2.9 ®^2.9* 





878 PARTIAL AND MULTIPLl CJORRELATIOH 


whence, in virtue of (16.16), 

.(15.28) 

1 P%p.q Pp2.ff 

expressing the partial recession coefficient in terms of those of next lower order. 
Writing down the similar et^uation for /Sn.gp taking square roots, we find 


Pis-w 


__P t8.g P lp.g Pip.Q _ 

■{(1 - Pfp.,)(l - Ph,.,)}^ ’ 


. (15.24) 


a fundamental equation giving the correlation coefficient in terms of those of lower orders. 


15.9. From the above results it is clear that the whole complex of partial regressions, 
correlations and variances or covariances of residuals is completely determined by the 
variances and correlations, or by the variances and regressions, of zero order. It is inter¬ 
esting to consider this result from the geometrical point of view. 

Suppose in fact that we have N sets of observations of p variates 


Xjj . . . Xip, . . . X^p, . . ., Xjfi . . . Xjfp. 

Consider a Euclidean (flat) space of N dimensions. To each set of values * 1 *, . . . 
there will correspond one point in this space, and the totality of points representing all 
observations will be p in number. (This method of representation, it should be noted, 
is not that of N points in a p-way space, which was the one used in some of the sampling 
discussions in Chapter 10.) Call these points Qi, Qt, . . . Qp. We will assume that the 
x’s are measured about their mean, and take the origin to be P. 

The quantity may then be interpreted as the square of the length of the vector 
joining the point Q,( = x,,, . . . x^/) to P. Similarly may be interpreted as the cosine 
of the angle Qi P for 


Plm 




which is the formula for the cosine of the angle between PQ^ and PQ^* 

Our result may then be expressed by saying that all the relations connecting the 
p points in the iV^-space are expressible in terms of the lengths of the vectors PQ and the 
angles between them ; and the theory of partial correlation and regression is thus exhibited 
as formally identical with the trigonometry of an iV'-dimensional constellation of points. 


15.10* The reader who prefers the geometrical way of looking at this branch of the 
aubjeot will have no difficulty in translating the foregoing equations into trigonometrical 
terminology. We will here indicate only the more important results required for later 
sampling investigations. 

Note in the first place that the p points Q and the point P determine (except perhaps 
in degenerate cases) a space of ^ dimensions in the iV^-space. Consider the point ^ 1 . 2 . 
whose co-ordinates are the N residuals Xi 2 ...fr In virtue of (15.9) the vector PC 1 . 2 ...P 
is orthogonal to each of the vectors PQ^ . . . PQp and hence to the space of {p — 1) dimen¬ 
sions defined by P, Q 2 > * • • Qp» 

Consider now the residual vectors Qi,gy Q%,q} where q represents the secondary suffixes 
34 ... (p — 1). The cosine of the angle between them, say 0, is is ortho¬ 

gonal to the space P, • • • Qip~iy Now take M on PQj, such that ^ and 
are perpendicular to PQp. Then MQi ^ is perpendicular to the space P, Qt, . . . Qp and 



873 


PAETIAL AND MULTIPLE OOERELATION 

so is Ma Th® cosine of the angle between them, say is (of. Fig. 16.1). Thus, 
to express in terms of p^t,^ we have to expreffit ^ in terms of d, ot the angle between 



the vectors PQi „ and PQi , in terms of that between their projections on the hyperplane 
perpendicular to PQp. We have 

(Qi 0 Qi «)* = ® , 

= MO^ . 4- MOi „ - MQa.a cos 


Further 

and 

and hence we find 


= MQl^ + MQi, 
PQ^^ ^ PM^ + MQ\,^ 
PQI , = PM* + MQl^ 


(15.25) 


MQi , MQjj., cos ^ = - PM* + PQi,, PQi.g cos Q 
PQi.a^Q*o PQi.PQ^., 

Now is the sine of the angle MPQi,,, the cosine of which angle is pip,,. Substituting 

PQi « 

in (15.25) we find 

A. - co s B ~ Pip., Pap.a _ .... (15.26) 

C08<^- {(1 

which is equation (15.24) in a sHghtly different form. The expression of a correlation 
coefficient hi terms of those of the next lowest order is thus capable of mterpretation as 
the projection of an angle on to a space of one fewer dimensions. 


Example 15.1 

In an investigation into the relationship between weather and crops, Hooker (1®07) 
found the foUowing means, standard deviations and correlations between the yields of 
seeds hav (x^) in cwts. per acre, the spring rainfall (x.) m inches and the accumulated 
temperature above 42°F. in the spring (x,) for an English area over 20 years;— 


Xi F= 28"02 

if, = 4-91 
it = 694 


ff, == 4-42 Pit = + 0-80 

<7, = 1-10 pis == — 0'40 

a, = 86 pss = — fi‘56 


374 


PARTIAL AND MULTIPLE CORRELATION 


The question of primary interest here is the influenoe of weather on crop yields, ai«l 
we consider only the regression of on the other two yariates. From the correlations 
of zero order it appears that yield and rainfall are positively correlated but that yield and 
accumulated spring temperature are negatively correlated. The question is, what inter¬ 
pretation is to be placed on this latter result? Does high temperature adversely affect 
yields or may the negative correlation be due to the fact that high temperature involves 
less rain, so that the beneficial effect of warmth is more than offset by the harmful effect 
of drought? 

To decide this question, let ns calculate the partial correlations and regressions. From 
(15.24) we have 

a Pig Pis Psa . 

va-pfaKi-pfs) 


Similarly 


_ 0-80 -(-0-40)(-0-66) 

V'{1 - {0-40)*}{l - (o-se)*} 
= 0-759. 

Pis.a ~ 0-097 
P 23.1 “ *“ 0-436. 


We next require the regressions /J and the variances of residuals. 

a _ cov(!i:,sa;j,) 

Pli.i - --- 

var OTg 8 


From (15.14) we have 


This, however, involves the calculation of and 3 which are not in themselves of interest. 
* We can obviate the process by noting that ftom (15.16) 

‘^ 1.23 = 0^1.3(1 ~ pfa-s)* 
o'g.is = o^2.s(l *“ Pig.s)* 

so that 




12.3 


, P 12.8 ‘^l.gS. 


The standard deviations <ti .28 and am are of some interest and may be calculated from 
(15.18). We have 

CTi .33 == ai(l — Pi2)*(l — pfs.a)* 

= -P?3)*(l -p!2.3)* 

the two forms offering a check on each other. 

From the first we have 

ci.23 = 4-42{1 - (0-8)*}t{l - (0-097)*}* 

= 2-64. 

Similarly am — 0-694 

^3.12 ~ 


Thus 


^ (0-759)(2-64) 

P12.3 = —oTgg— 


3-37, 


and we also find 


^13.2 ^ 0-00364. 



PARTIAL AND MULTIPLE CORRELATION 


375 


The regression equation of Xi on X« and Xg is then 

Xj - 28-02 = 3-37{X, - 4-91) + 0-00364(X, -- 694). 

Thifl equation shows that for increasing rainfall the yield increases and that for increas¬ 
ing temperature the yield also increases, other things being equal. It enables us to isolate 
the effects of rainfall from those of temperature and study each separately. The positive 
regression means that there is a positive relation between yield and temperature when 
the effect of rainfall is eliminated. The partial correlations tell the same story. Although 
pxt ie negative, is positive (though small), indicating that the negative value of pi, is 
due to complications introduced by the rainfall factor. 

The foregoing procedure avoids the use of determinantal arithmetic, but the reader 
who prefers to do so may use equations (15.21). For example: 

0-80 — 0-40 

1 — 0-56 

— 0-56 1 

-0-56 

1 


(Tj 23 — Oi /— = 2-64 as before. 

V 


from which 


(O 


fi>u 


1 

0-80 

- 0-40 
0-2448 

1 

- 0-58 
0-6864, 


/ 




15.11. When the work involves more than three variables it is desirable to systematise 
the arithmetic. Considerable assistance may be derived from tables of quantities such as 


1 - p*. V(1 - p^h 


1 

V(1 - pf2)(l - ph)' 


Kelley (1916, 1938) and Miner (1922) have given tables for this purpose. Trigonometrical 
tables are also useful in some cases. For instance, given p we can find 6 = cos'^p and 


hence sin 6 


-\/(l — p®)), cosec 0 



-^-Y etc, 

V(1 - p V’ 


For determinantal work some systematic method of reduction such as the Doolittle method 
is useful. 


Example 15.2 

In some investigations into the variation of crime among cities in the U.S.A., Ogbum 
(1935) foimd a correlation of — 0-14 between crime rate (XO as measured by the number 
of known offences per thousand inhabitants and church membership (Xj) as measured 
by the number of church meipbers of 13 years of age or over per 100 of total population 
of 13 years of age or over. The obvious inference is that religious belief acts as a deterrent 
to crime. Let us consider this more closely. 

If X, = percentage of male inhabitants, 

X, = percentage of total inhabitants who are foreign-bom males, and 
X* as number of children under 6 years old per 1000 married women between 
16 and 44 years old. 



m 

Ogbura finds— 


PAETIAL AND MULTIPLE COEEELATION 


f>it « + 0‘44 
pit =«= — 0‘34 
pit = — 0*31 

Pi5 =« -“ 0*14 
ptt = + 0*25 


pMi 

P*4 

pa* 

pas 

P*s 


- 0*19 
-- 0*35 
+ 0*44 
+ 0*33 

- 0*85. 


From this and other data given in his paper it may be shown that we have, for the regression 
of Xi on the other four variates, 

Zi - 19*9 r:. 4*51(Z* - 49*2) 0*88(X, -- 30*2) ~ 0 072(X4 -- 4814) + 0*63(Xfi - 41*6). 


and for certain partial correlations 

Pi5.. = - 0-03 
pi5.4 = + 0*25 
piB.fi* === *4“ 0*23. 

Now we note from the regression equation that when the other factors are constant 
Xj and Xs are positively related, i.e. church membership appears to be positively associated 
with crime. How does this eifect come to be masked so as to give a negative correlation 
in the coefficient of zero order p^s ? 

We note in the flbret place that the correlation between crime and church membership 
when the effect of Zt, the percentage of foreigners, is excluded, is near zero. The correlation 
when Z 4 , the number of young children, is excluded, is positive ; and the correlation when 
both Zt and z^ are excluded is again positive. It appears in fact from the regression equation 
that a high percentage of foreigners and a high proportion of children act as deterrents 
to crime. Now both these factors are positively correlated with church membership 
(foreign immigrants being mainly Catholic and more fecund). These correlations sub¬ 
merge the positive influence on crime of church membership among other members of the 
population. The apparently negative effect of church membership appears to be due to 
the more law-abiding spirit of the foreign immigrants and the fact that they are also more 
zealous churchmen. 

The reader may care to refer to Ogbum’s paper for a more complete discussion. 


The Multivariate Normal Dwtribution 

15.12. We now turn to consider the gf^neralisation of the univariate and bivariate 
normal distributions to the case of p variables. 

Consider the multivariate distribution 






. (15.27) 


This has p variates and evidently reduces, when p — 1 or 2, to the norma] tjrpe. We shall 
take it to be the generalisation of the normal distribution, and proceed to consider how 
the constants « are related to the correlations of the variates. It is, of course, assumed 
that the a’s are such as to ensure the convergence of the distribution function.' For this 

it is necessary and sufficient that the quadratic form 2" a,g shall be positive-definite 

Of Cfg * 

Le. that there is a real linear transformation reducing it to the sum of squares of p {or, 
in degenerate cases, fewer) new variates. 



THE MULTIVABiIATE NORMAL DISTRIBXJTION 


877 


Make the treunafomation 

.(16.28) 

and choose the Vb so that the exponent of (15.27) becomes — JZ?*. Then we have 

and hence, writing (a) for the matrix of the quantities a, {1) for that of the Vb and (?) for 
the transpose of ((), we have ' 

(oe)(?)(Z) = 1.(15.29) 

Further, the Jacobian of the transformation is 121, the determinant of the Ts, and hence 
the integral of dF is given by 

r Fco fi 

... exp {— JZf*} dfi . . . dSp = {27i)^yo 111- 

00 W ~QO 

Hence, since from (16.29) | a | | Z |* = 1, we have 



Vo 


(2n)^ I Z I (2n)¥ 


. (15.30) 


Let UB now find the characteristic function of the distribution. We have to integrate 
over the range of a:’s the exponential of 




= - i[^(0 - 2Z(tVr^r^l#)l 
— ^ i^r^r Wl* 

8 r f,k r 

The first part reduces on integration to a constant. The second gives the exponential 
of a series of terms of second degree in t, the coefficient of t^ti^ being 

W- 

r 

Now I Ijj, is the minor of the jth row and i;th column in the matrix {l)(l) and hence, from 
(15.29), in the matrix (a)“^ = (.4) say. Hence we may write 


<f>(ti, • • . tp) — exp { J27 . . (15,31) 

But when this is expanded the term in is — by definition and hence (^4) is the 

matrix (a>) of equation (15.20). Thus 


<l>(h) • • • ““ exp {— i^ipjk 

Furthermore, 

(a) == (Ar^ = (a>)-i 

and hence the distribution itself may be written in the form 

dF = —i— exp j - ^ Zfa;„ . . . 

{ 2 nWa,* ^ \ 

For example, with the bivariate form 

1 p\ 


dx„ 


. (15.32) 


. (15.33) 



3T« 

and benoe <0 « 1 
tbe famitiar form 

dF 


PAEUAL AND MULTIPLE CORRELATION 
- p*, «>xi =» a>« = 1. «>it = wm = - P> SO that the distribution becomes 


1 f 1 

“ 27t(l - p*)t t 2(1 - p*)\<r! OxOi 


4 _ a:l\ldxtda;, 
tr|/J Ox Ot' 


15.13. For any fixed a:, . . . the exponent of (16.33) reduces to the normal 
univariate form in Xx with mean 

^ + e>xx ^ . «>«, .(1S.34) 

^11 \ 0’s Op/ 

Thizs the regression of Xi on the other variates is exactly linear. The variance of Xi in any 

(OO^ 

array is —- and the distribution is thus homoscedastic. It follows generally that the 

0 >ii 

regression of any variate on any or all of the others is linear. Comparing (16.33) with 
(15.21) we see that the distribution may be written 

dF =- - -exp J- P^’-^^ rP^r^rX dx, . . . dx,, (16.35) 

(2:i)?<Tx . . . *• °r.l2...pOs.l2...v) 

where the secondary suffixes in the p and cr*s do not, of course, contain r and a. 

Since eveiy x is normally distributed, every linear function of x is so, as may be seen 
at once from (15.33). In particular the residuals are normally distributed. 

If in (15.33) we make the substitution 


Cl = Xi 
?2 == ^2.1 
^3 == X 3 21 
^4 ~ ^4.321 


the exponent will be a quadratic function of the C’s. In this function all product terms 
Cj must vanish, for the covariance of and vanishes in virtue of the remark at the 

end of 15-4. It follows that the distribution function may be written in the form 


dF 


1 

71 . 

(2OT)r<ri<T,.i 0, 81 . . . 



(10.36) 


Prom this it appears that the joint distribution of any two residuals x,- g and x* , is of 
the bivariate normal form with correlation p^^ g. Consider, for example, x,., and X 3 21 - 
Each is normally distributed and is uncorrelated with and independent of the other variables 
in (16.r.6). If X 3.21 isexpressed in terms of residuals of the second order, i.e. as x,,! —/Jj, j x, ,, 
the joint distribution of Xj.i and Xj., becomes of the bivariate form with correlation pjs i ; 
and so generally. 

These results are important in the interpretation of regressions and correlations in 
the normal case. In the general case a coefficient such as pj^ g represents the average 
dependence, so to speak, of x, ^ and x* g, being based on the sum i:{x, g ,). In the 
normal case is constant for all the sub-populations corresponding to particular assigned 
values of the other variables. 


Sampling Distribuiiona of Partial Correlation and Begression Coefficients 

15.14. We now consider the sampling distributions of the coefficients of partial 
correlation and regression. For large samples the values of Chapter 14 appropriate to 



DISTEIBUnON OF PARTIAL COEFFICIENTS 


379 


oomS&iiooB and legressions of zero order may be used (subject to tbe proviso as to the 
uiueliability of the standard error for p unless tbe sample is very large). For example, the 
vtuianoe of /n^., in the normal case is given by 

▼«•(»•/*.,) =*^(1.(18.37) 

where n is the sample number; and that of the regression coefficient by 

var (6^fc.j) »I .(15.38) 

The proof of these results by the direct methods of Chapter 9 is a very tedious piece of 
algebra. They follow simply, however, from the remark of the previous section that the 
correlation between any two deviations and % , is of the normal type with coefficient 
pf^ g ; for it follows that , is distributed as the correlation between two normal variates. 
Similar considerations apply to the regression coefficients. It will be shown presently 
that if the original distribution was based on n observations, that of , is of the form 
of the correlation p^^ based on n — « observations, where s is the number of secondary 
subscripts in q ; but as our equations are only true to order n"* the divisor in (15,37) 
and (15.38) may remain at n without further error. 


15.15. Consider now the geometrical representation of 15.9. Suppose we have 
three points Q, E, 8 va the n-fold space, represented by Xi . . . . . . y„, ... 

respectively, the origin being P and the variables measured from their mean. Then the 
coefficient of correlation between x and y is the cosine of the angle QPR, that between 
y and z the cosine of EPS and that between z and x the cosine of 8 PE. Now imagine a 
sphere described with unit radius and centre P, cutting PQ, PE and PS in Q', B', S'. Then 
will the partial correlation be the cosine of the angle of the spherical triangle Q'S'R’, 
and so for the other two partial correlations. This was, in effect, proved in 15.10, for the 
angle Q'S'R' is the angle between the projections of PQ and PR upon the space perpen¬ 
dicular to PS. 

Now we may make an orthogonal transformation, corresponding to a rotation of the 
co-ordinate axes, without affecting the correlations; moreover, if the n values of one 
variate x are independent and normally distributed so will be the n values of the trans¬ 
formed variates. Let us then make such a transformation and take PS as one of the new 
co-ordinate axes. It is then apparent that the distribution of which is the cosine 
of an angle in the space perpendicular to PS, is the same in form as that of except that, 
being in (n — 1) dimensions, it is based on (» — 1) independent pairs of normally distributed 
variates instead of n. 

Hence for samples from a normal population the distribution of the partial correlation 
coefficient of the first order from n sets of observations is the same as that of a correlation 
of zero order from (a — 1) sets of observations. By a repetition of the same argument 
it follows that the distribution of a correlation coefficient of the sth order is that of the 
correlation of zero order from (ra — «) sets of observations. The results of the previous 
chapter are thus immediately applicable to partial correlations. If, of course, s is small 
compared with n, the distribution of partials is sensibly the same as that of ordinary correla¬ 
tions, which confiniis the approximation of the previous section. 



S80 


PARTIAL AND MULTIPLE CORRELATION 


The MvUipU Oorrdaium Coefficient 

15.16. As in 14.22, the multivariate regression equation can be used to estimate 
the values of one variate firom given values of the others ; but in ordm to see how good 
such as estimate is likely to be we require to know whether the values “ predioi^ ” by 
tho regression equation are in close relationship to the observed values. Consider the 
regression of Xi on the other variates: 

. “ ^n.u..pXt+ • 

** ®®'y- ...••• (15.39) 

If we substitute an observed set of values a;, ... a:^ we shall get a quantity 61 , 33 , 
say, differing from the observed ari by the residual quantity a:i, 23 ...p, so that 

— *1.23...p ~ ®1.23...p. .... (16.40) 

We may then judge of the accuracy of the representation of the observed by the ‘re¬ 
gression equation by correlating Xi and Ci. 23 . .p. We have 


(16.41) 

(16.42) 


‘i(2...p) 


■^(®1.2S...p) — X{Xi 2'1.23...p)* 

^ N{al - . . 

and ^(®j®i. 23 ...p) “ X(xf) •^(*I.23...p) 

= iV>?-a?3j...p) . . , 

Hence the correlation between a:, and Ci. 23 ...p, say Ri( 2 .,.p), is given by 

= COV (^lgl .2.-l .p) 

(v&TXi var ei 23 .p)i 

-= ~ -p) * 

giving 

-2 

02 _ 1 23...» 

^H2...p) - 1 ~ —-2 . • . 

^ 1 ( 2 ...p) ^ called the Multiple Correlation Coefficient between Xi and Xj 
have, similarly, multiple correlation coefficients of any variate on some or all of the others, 

e.ga Xt(23)f ‘®^4(123)» 

Two alternative forms of It are worth noticing. From (15.43) and (15.21) we have 

^ . . • . ♦ (15.44) 


(15.43) 
We 




Oil 


and from (15.43) and (15,17) 

1 — ‘®l(2...p) = 


(1 — Pi2)(l ~ ^ 13 , 2 ) • • (1 Plp.2. .(/>-!)) 


(15.45) 


15*17. Prom the latter equation it follows that since no p is greater than unity, R 
must be at least as great as the absolute value of any p entering into (15.45). R itself is 
essentially positive, for Oi > Uj 23 p (equation (15.18)). 

It follows that if JR = 0 all the constituent p's must be zero, and conversely. In this 
case is completely uncorrelated with any of the other variates and the regression equation 
is quite useless as a means of estimating the dependent variable. 

On the other hand, if H == 1 the correlation between the observed and the estimated 
value given by the regression equation is perfect, i.e. a?! is a linear function of the other 
variates. R thus provides a measure of the relationship between Xi and the remaining 
variates. 



381 


THE MULTIPLE CX)RRELATION COEFFICIENT 

15.18. The coefficient JR has an interesting geometrical interpretation. It was noted 
In 15.10 that the residual vector PQi.t..,p is orthogonal to the space of (p -- 1 ) dimen¬ 
sions defined by P, Qt ■ ■ - Qp- Consequjently the angle between this vector and the 
p-dimensional space P,Qt . , . Qj, is the complement of the angle Qi PQt, 2 ...p, that 
is to say its sine is JRi{j...p). From this standpoint we see that if R = 0 , PQi is also 
orthogonal to the space P, Qt, . . , Qp, i.e. that is unoorrelated with Xt ... Xp. If 
JR s= 1, PQt lies in the space and Xi is linearly dependent on a;, . . . Xp. 

15.19. The coefficient R, as mentioned in 14.24, is analogous to the correlation 
ratio rj, and in fact firom some points of view the two are formally identical. Given a set 
of variate-values we may consider the variance of x^ as composed of the sum of two variances, 
for we have, by definition, 

var «, = cj = of - of. 2 ...p + crf. 8 ...p 

= var -f var (xi - ei. 2 ...p). . . (15.46) 

Thus the variance of x may be regarded as the sum of the variances (1) of the deviations 
of Xi from the values given by the regression equation, and (2) of those values themselves. 
We may write (15.46) as 

var (X,) = a?Rf( 2 .,.p, -f- tr?(l - R?( 2 ...p)). . . (15.47) 

Now consider again equation (14,76) in the form 

var X = var x{r)%, + 1 - »/^} 

= + «^f(i — nip) .(15.48) 

The relation with (15.47) is evident- It is redeemed from triviality by the fact that, just 
as the two parts on the right-hand side of (15.48) are independent in samples from an un¬ 
correlated normal population, so are those in (15.47) in samples from a multivariate 
normal population for which the parent JR is zero. For in that case Xi is independent of 
the other variables and therefore deviations of Xi from the regression values are independent 
of the deviations of those values about their mean. 


15.20. From this fact we can derive the sampling distribution of R (the sample 
value of the multiple correlation coefficient) when R (the population value) is zero and the 


population is normal. In fact, as in 14.24^ we see that 


(1 

R^ 


is the quotient of two 


independent variables. The numerator is distributed in the Type III form with N — p de¬ 
grees of freedom, for it is a multiple of the variance of Xi 23 ...p ; var will be distributed 
as the sum of the squares of N variates about their mean, i.e. with JV — 1 degrees of freedom, 
var 2 with N — 2 degrees of freedom, and so on, every additional subscript lowering 
the degrees of freedom by unity, as in 15.15. Further, the denominator is distributed 
in the Type III form with p — 1 degrees of freedom, for it is the difference of var which 
has N — I degrees, and varxi. 2 ...p wliich has N —p degrees.* Thus the distribution 
of R^ is formally the same as (14.85) with R^ instead of i.e. is 


dF 


1 



p p 



(1 




(15.49) 


♦ It is not, of course, true in general that the difference of two Type III variates is distributed 
in the Type HI form. In the present case wo can find an orthogonal transformation of the variables 
X to new independent normal variables, of which one may be taken to be the residual Xi.ag , 



ssa partial Am) multiple correlation 

This can be reduced to the 8-form by writing 

^ .. S* N —p\ 

* - iloge 1 _ jj, p _ 1 I , . . . (16.50) 

Vi = p — 1, Vt = N — p ) 

The mean value of JB* is the positive quantity (p — — 1)- 

15.21. We proceed to find the distribution of ^ in samples from a normal multi¬ 
variate population when R is not zero. Two preliminary remarks are necessary. 

In the first place, any multivariate normal population can, by a linear transformation, 
be transformed to new variates which are normally distributed and independent. One 
such transformation has been given in 15.13. 

Secondly, any linear transformation leaves the multiple correlation coefficient in¬ 
variant, that is to say, the coefficient between Xi and a:, ... Xp is the same as that between 
Xi and the transformed variables . . . fp. Referring to (16.43) we see that, apart 
from the constant of, depends only on and since the regressions are 

chosen so as to minimise this quantity, the same minimum is reached whether we use the 
variables a:, ... Xp or the linearly related variables . . . |p. Conversely, if the corre¬ 
lation between Xi and f, is a maximum for all possible sets of f’s, then that correlation is 
the multiple correlation coefficient between Xi and the f’s, and Xi is uncorrelated with 

From the geometrical standpoint of 15.10, let us take the sample vectors PQt . . .PQp 
and in the space defined by these vectors choose another sot PS^ . . . PSp which are 
mutually orthogonal. These will correspond to the transformed variates f, and the angle 
between PQi and the space remains unaltered, i.e. It is invariant. 

Let us now choose f, so that the correlation between Xj and is a maximum in the 
population. Then if PSt is the sample vector corresponding to PQi will be orthogonal 
to all the other vectors PSt . . . PSp (since Xi is then independent of f j . . . fp). 

In any given sample value the correlation between Xj and it wdl not be equal, in general, 
to R (though the correlation in the population is R), but to a quantity r, say, varying from 
sample to sample and equal to cos“^QiP(S,. Let PT be the vector representing the sampling 

.p 

regression formula 57 This will lie in the x-i space (cf. Fig. 16 . 2 ). Then 



THE MULTIPLE CORRELATION OOEEFICIENT 


383 


FT is Buoh as to make the greatest possible angle with QiP, the angle being oo8~* R, and 
the perpendicular from Qx on to the space meets FT in a point, say K. Let the 
point L be taken on P8t such that LK is perpendicular to PK and join QiL. Let the 
angle KPL be y. 

Then QxL* = QxP* + PL* - 2QxP.PL r 

= QxK^ + RL* 

= QxP* - PE* + PL* - PK* 

and hence 

_ PK* _ PK PK 
^ "" QxP.PL WPL 

= RcoBf .(15.61) 

B and yt are independent. 

Now we consider the sampling distribution of the correlation coefficient r. It is to 
be remembered that Xx and f, are distributed in the bivariate normal form. The distribu¬ 
tion of *■ is then given by the formulae of 14.14 and may be written 



. (16.52) 


since R is the population value of r. If P = 0 the second factor in (15.52) reduces to 
unity. We may therefore regard the second factor as the ejBFect on the frequency density 
in the region dr of a population correlation R, Now we have already seen that when 
R = 0 the distribution of R corresponding to that of r is given by (15.49). We have 
then to find by what factor (15.49) is to be multiplied to allow for the variable frequency 
density. Such factor is the second part on the right-hand side of (16.52) with R cos rp 
substituted for r (as given by (16.61) ) and integrated over the permissible domain of yi. 

It will be seen from Fig. 15.2 that for fixed P and S^, T may vary in the space of {p — 1) 
dimensions determined by P and St ... 8^; and for constant tp it will lie on the cone 
in that space obtained by rotating TP about PSf The element of area cut off by this 
cone on the unit hypersphere is proportional to its solid angle, that is to sin ip, and 
hence the frequency of in the range dy) is 



Thus the density factor is 


. (15.63) 




884 PARTIAL AND MULTIPLE CORRELATION 

Pinally tibe distribution of R is 


dF 


r{^ 


r(n~^p\r/p -^ (1 - - B*r^h(R*) 

n * sin *'"® yi dz df 

(cosh z — RB cos y>) 


,n-l 


. (16.55) 


This may be expressed as a hypergeometric function. Expanding the integrand in 
powers of ooayi we have, since odd powers vanish on integration, 

» /n + 2 j~2 \ sin »>-» v- cos y>(Djt\n 

2 j\ 2 j y ■(coshz)'-'»+2? ^ 


and since 


and 


the integral becomes 


J cos ip sin v-^y) dip = b{^—^ ~ 

f* _ «/l n^ 2 j - 1 \ 

J_* cosh \2’ 2 / 


^ r ') - t -) Ks- 


(16.56) 


whence we find, from (16.55), after a little further reduction. 


dF 




• n—1 j 7~3 n ~;>~2 

(1 - R^)-r(R^) j-{i - B^y-'i-~ dR^ 


>cFl^-rA’L-zlpp,R.B.\. 


[15,61) 


15.22. Writing a = b = \[n — p) we have 

<' ~ ». «■*') ■ (15.58) 

- r^jm <*’)■“ <* - ^ ■'’< - “• ''J'’ • <‘5.50) 

It may be shown that 

/^i(i^*) - 1 - ~-~^(l -R*)Ji’(l,l,a+6 + 1,R»). . .(16.60) 

In particular, when i? = 0 we have the known result 

a „ — 1 




d b 7h — p 


. (15.61) 



THK MULTIPLE CORRELATION COEFFICIENT 


386 


For lai!ge n yn hare approximately 




a + (6 - + R* 

a -}- 6 4* 4 


For the second moment 






or approximately 




4ll*(l - R*)* 


n 


. (15.62) 


. (15.63) 

. (15.64) 


which, however, breaks down near R — 0. It would, in fact, appear that the (Ustribution 
of jB tends to normality when R 9^0 but not when R —0 (cf. Exercise 16.3). 


Example 15.3 

From Example 15.1 we have foimd <w = 0.2448, <»„ = 0.6864, from which we have 

- A ^864 
= 0-6433, 


indicating that the regression equation is a fairly close representation of the data, since 
J2, the correlation between observed ar/s and those provided by the equation, is high, 
about 0.80, 

It is hardly necessary to test the significance of such a value, but we will do so to illus¬ 
trate the arithmetic involved. If Xi were uncorrelated with the other variates we should 
have H = 0, and on the assumption that the population is normal (a reasonable assumption 
for crop yields, sunshine and rainfall records) we may use equation (15.50). We have, 
since p == 3, n = 20 


2 = J log* 


0 6433 n 
¥z5m' 2 


= 1-36 


Vi = 2, r, = 17. 

From Appendix Table 6 the 1 per cent, significance point of z for Vi = 2, v, = 17 is 0-9061, 
BO that the observed R is almost certainly significant, z being much greater than can be 
accounted for by sampling alone. 


NOTES AND REFERENCES 

The theory of partial correlation is mainly due to Yule (1907). The reader may refer 
to M. Ezekiel’s book (1930) for a detailed discussion of the practical side of correlation 
analysis. See also a paper on the theoretical side by Frisch (1929). 

For a knowledge of the sampling properties of the partial correlations we are indebted 
to Yule (1907), who pointed out the applicability of large sampling, “normal ” formulae 
for coefficients of zero order to the partial coefficients, and to R. A. Fisher (1924), who 
is responsible for the exact result for small samples from a normal population and the 

A.S_VOL. I. 00 





m PARTIAL ANB MULTIPLB CORRELATION 

distributicm of the multiple correlation coefficient (1928). Some approximate reeults for 
tile latter had been obtained by Isserlis (1917) and P. Hall (1927). Wiahart (1931, 1932) 
has studied the exact distribution of B and the formally equivalent rj. Both of Fisher’s 
papers are notable examples of the power of the geometrical method of deducing sampling 
ffistributions. 

In comparing formulae given by various writers it is as well to examine whether the 
total number of variates (our p) or the number of dependent variates (p — 1) is being 
used as a constant in the equations. 

Ezekiel, M. (1930), Methods of Correlation Analysis, Chapman and Hall, London; John 
Wiley and Sons, New York. 

Fisher, R. A. (1924), “ The distribution of the partial correlation coefficient,” Metron, 
3, 329. 

-(1928), “ The general sampling distribution of the multiple correlation coefficient,” 

Proc. Boy. Soc., A, 121, 664. 

Frisch, R. (1929), “ Correlation and Scatter in statistical variables,” Nordic Statistical 
Journal, 1, 36. 

Hall, P. (1927), “ Multiple and partial correlation coefficients in the case of an n-fold variate 
system,” Biometrika, 19, 100. 

Hooker, R. H. (1907), “ The correlation of the weather and the crops,” Jour. Boy. Stat. 
Soc., 65, 1. 

Isserlis, L. (1914), ” On the partial correlation ratio. Part I, Theoretical,” Biornetrika, 
10, 391, and ‘‘Part II, Numerical,” ibid. (1916), 11, 50. 

—— (1917), ‘‘ The variation of the multiple correlation coefficient in samples drawn from 
an infinite population with normal distribution,” Phil, Mag., 34, 205. 

Kelley, T. L. (1916), ‘‘ Tables to facilitate the calculation of partial coefficients of correlation 
and regression equations,” Bulletin of the University of Texas, No. 127. 

- (1938), The Kelley Statistical Tables, Macmillan. 

Miner, J. R. (1922), ‘‘ Tables of Vl — r* and 1 — r* for use in partial correlations, etc.,” 
Johns Hopkins Press, Baltimore. 

Ogbum, W. F. (1935), ‘‘ Factors in the variation of crime among cities,” Jour. Amer. 
Statist. Ass., 30, 12. 

Wishart, J. (1931), ‘‘The mean and second-moment coefficient of the multiple correlation 
coefficient in samples from a normal population,” Biometrika, 22, 353. 

- (1932), ‘‘ Note on the correlation ratio,” Biometrika, 23, 441. 

Yule, G. U. (1907), ‘‘ On the theory of correlation for any number of variables treated 
by a new system of notation,” Proc. Riyy. Soc., A, 79, 182. 


EXERCISES 


15.1. Show that 


fin. 


and that 


34...(p-l) 


Pn.3i...(p-i) 


fin .34.. .p + Pip 2.3...(p-i). 


1 


J3...(p~l) 


filp.2S...iii~l} 

*1 - r Plp.3 3 . .(p-l) P 2p 13...( p-1 ) 

Plp.23...lp-1))* (1 - P|p.l3,..(p-i))* 


(Yule, 1907.) 



EXERCISES 


387 


15.2. Show that for ^ variates there are correlation coefficients of order zero 
and ^ 7 ^)(2) Sh^w further that there are correlation coefficients 


altogether and regression coefficients. 


15.3. Show that for given pi*, f^u Jnust lie in the range 
Pn Pn ± (i P12 ^13 + P12 ph)^ 

and that if Xi and Xi and are uncorrelated no inference can be drawn from that fact 
as to the correlation between and x*. 

15.4* Show that if pi, be zero, ^ unless at \ea 43 t one of pj,, p,, is zero, 

15.5. If the correlations of zero order among a set of variables are all equal to p,' 

show that every partial correlation of the sth order is 77 — 7 —r. 

(1 + ^p) 

15.6. Show that the distribution of the multiple correlation coefficient B tends, 
in normal samples, for large n, to the form 

(iz3 

dF = exp {- iR* - m 

X 

where /3* ^ R^{n — p), JS* == B^{n — p). 

In particular, where p = 4, 

dF = [exp {- - m - exp {- i(R + ^)*}] dB. 

Thus, when /S = 0 the distribution of B does not tend to normality, but when /5 is not 
zero and is thus large for finite JR, R is distributed approximately normally about /J with 
variance unity. 

(Fisher, 1928.) 

15.7. Show that the distribution function of JR in normal samples may be written, 
if n — p is even, in the form 

(1 — Jl®)"*" JRP -1 (1 _ 

X F|-i, - 

(Fisher, 1928.) 







1 , 


CHAPTEB 16 
RANK CORRELATION 

16-1. In previous chapters wa have considered the dependence of attributes, as 
measured by coefficients of association, and that of variables as measured (in the normal 
case at least) by product-moment correlation. In this chapter we shall consider a type 
of relationship which, in a sense, occupies an intermediate position between the two, the 
correlation of ranks. 

Consider a set of individuals which can be arranged in order according to some quality, 
such as a set of men according to ability or a set of musical compositions according to the 
degree of preference with which they are regarded by some observer. An ordered arrange¬ 
ment of the objects will be called a ranking and the ordinal number of a given individual 
in the ranking is called his rank. Thus with a ranking of n individuals there will be one 
rank corresponding to each of the n ordinal numbers 1 to n. 

16-2. Ranking is less general than the classification of attributes m the sense that 
the division of a population into classes A and not-A, or Aj, A* . . . A^, does not require 
any ordering of those classes; the measures of contingency and association discussed in 
Chapter 13 are invariant under rearrangements of columns or rows in the tables. On the 
other hand, individuals arranged in an ordinary frequency table have their interrelationships 
more closely defined than if they are merely ranked, so that ranking is in a sense more 
general than measurement according to a variate-scale. To put the point in a slightly 
Afferent way, a ranking is invariant under any transformation which stretches the scale 
of measurement of the variate- 

16.3. In practice, ranked data usually arise in two ways:— 

(a) From material which could be measured on a variate-scale but which is not so 
measured for reasons of economy, lack of adequate instruments, and so forth. This class 
includes the case where the data are given as measurements but are then ranked on the 
basis of those measurements in order, for example, to reduce the arithmetical work in 
investigating correlations. 

(b) From material which is believed to be capable of measurement theoretically but 
cannot be measured in practice, e.g. human preferences for food or intelligence. Ranking 
methods are sometimes applied rather uncritically to material which the experimenter 
considers to be capable of ranking, whether it has been demonstrated to be so or not. We 
shall return to this point below. 

It is always possible by suitable conventions to impose a scale of measurement and 
hence a variate-system on ranked material; but the process is sometimes rather artificial 
and we shall in the fiirst instance consider ranked material as such, without reference to 
the possibility of there being any pre-existent or superimposed variate in the background. 

Spearman^a Coefficient of Bank Correlation 

16.4. Consider a set of n individuals ranked according to two variables in the orders 
JTi, Xi, • . . X„, Yiy Ff « • • where the X’s and the Y*b are permutations of the 

388 



SPEARMAN’S COEFFICIENT OF RANK CORRELATION 


389 


ymin baa^ lioH,. Our problem ie to dieouas the relationfifaip bettreen the X’s and the T’s. 
II lAe indiTiduals are denoted by Aj . . . we may write the rankings in the form 

Individual Aj A, ... An] 

Ranking 1 X, . . . X„V .... (16.1) 

Ranking 2 Ti F, . . . 

We note first of all that the oonoordance between rankings is perfect if and only if 
Xj » F^ for all j. It is natural to consider the differences Xf — Tf{ — dp say) as measuring 
the difference between the two rankings. They ^ zero if and only if the concordance 
is perfect and their magnitude to some extent reflects the divergence of the rankings from 
perfect concordance. We also note that 


n n n 



. (16.2) 


for each of the sums of X and F is the sum of the first n natural numbers. We might 
then take X | d | as a measure of discordance, and a coefficient based on this quantity was in 
fact proposed by Spearman (1006). It is however subject to several disadvantages, similar 
to those attaching to the mean deviation, and a more suitable measure is obtained by 

(»* — n) 


using X{d*). It is easy to see that the maximum value possible for X{d*) is 


For 


^(d*) is the greatest if the d’s are as different as possible, i.e. if one ranking is the reverse 
of the other, so that the d’s are (w — 1), (n — 3) . . , — (» — 3), — (n — 1), though not 
necessarily in that order. In this case 


X{XfYf) = l(n) + 2(» — 1) + 3(» — 2) + . . . »{n — (n — 1)} 

= 1{(TC + 1) - 1} + 2{(» + 1) - 2} + . . . n {(n + 1) - (n)} 


(» + i)Zi - Ei’ 






n{n + l)(w + 2) 
6 


(16.3) 


Thus lid^) = i:(X*) + E{Y^) - 2S{XY) 

__ n(n + 1)(2« + 1) n(n + l){n + 2) 

= , _ _ 


— n 

3 • 

We then define 

„ _ 1 _ 

^ — n 


(16.4) 


(16.5) 


AS the Spearman coefficient of rank correlation. If the concordance between rankings is 
perfect i7{d*) = 0 and /> = 1. If the discordance is perfect p *= — 1. In other cases p lies 
between these limits. 






t 


380 


RANK CORRELATION 


It is^worth noticing that /> is the product-moment coefficient of oorrelatfon between 
X and T when we regard the ranks as variate-values. For we then have 

. 


_ „ n(w -f ])(2w -I- 1) /n -f 1\* 

n var X —n var T — ^—- - — »l—^ j 

n* — n 

= _ „ .... 

n cov.(X, T) = ^(Xr) - n{^[(X)y 

*= - iXCX - Y)* + X{X*) - -<!L+iI* 


~n 

12 


- W»), 


80 that the product-moment correlation coefficient of X and Y is 


( n^ — n\ 
~\¥~) 

„i - =p. 

— n 


(16,6) 


(16.7) 


16.5s There is an element of artificiality in the Spearman coefficient as defined which 
we must remove. The ranks are ordinal numbeis and cannot without justification be 
operated on by the laws of cardinal arithmetic. For instance, if is ranked 4th and 
8th by two observers, di is (4 •— 8); but what does 4th minus 8th mean, and what signifi¬ 
cance is to be attached to its square ? It is not entirely trivial to note that the necessary 
transition from ordinals to cardinals may be made without invoking a variate-scale. When 
we rank a member as r we mean that in the set of a, (r — 1) members are ranked higher. 
This number (r — 1) is a cardinal and in our particular example 4th minus 8th may be 
regarded as meaning that the difference of the number of members ranked higher by the 
two observers was 4, 


Example 16 J 

Two judges in a beauty contest rank the 10 competitors in the following order: 
6 4 3 1 2 7 9 8 10 5 

4 1 6 7 5 8 10 9 3 2 

What is the rank correlation ? 

The differences between the ranks are 

2 3 --3 -~6 --^3 - 1-1 -^173 

which sum to zero as they should. 

Thus E{d^) ==4 + 9 + 9 + 36 + etc. 

= 128 


' - 990 -= 



AN ALTERNATIVE CQEBTICflENT OP RANK CORRELATION 8dl 

This indioates some sort Of concordanoe between the standards of the two judges, but not 
a very strong oonoordanoe. 

SmtrtpU 1$.2 

In the previous example there was no information about the ** real ” order of the 
competitors, and /> merely served to measure the degree of agreement between judges. 
Consider, however, the following case, where an objective order is known: In a test for 
ability to distinguish shades of colour, ten discs were prepared ranging from light to dark 
red, and a subject was asked to arrange them in order. The true otjder, as determined by 
a colorimetric method, was 

1, 2, 3, 4, 5, 6, 7, 8, 9, 10. 

The order produced by the subject was 

4, 7, 2, 10, 3, 6, 8, 1, 5, 9. 

What sort of a judge is he ? 

The differences are 

- 3. - 5, 1, ~ 6, 2, 0, ~ 1, 7, 4, 1 

and Z(d^) = 142, p = 0-139. 

The coefficient is low and we conclude that the observer was a poor judge* 

An AUemative Coefficient 

16.6. A second coefficient of rank correlation which has certain advantages may 
be obtained as follows: Consider again the ranking of the previous example 

472 10 368159 . . . * (16.8) 

Consider the order of the nine pairs of numbers obtained by taking the first number 4 
with each succeeding number. The first pair, 4, 7, is in the correct order (in the sequence 
1, 2, ... 10) and we therefore allot it the score + 1. The second pair, 4, 2, is in the wrong 
order and we therefore score — 1. The nine scores will be found to be 

+ + 1 + totalling + 3. 

Consider next the scores of the second number 7, with its eight succeeding numbers. They 
are 

-l + l- l- l + l~l-l+l, totalling - 2. 

Proceeding thus with each number we find 9 scores as follows :— 

+ 3, - 2, + 5, - 6, + 3, 0, -- 1, + 2, + 1. 

The total of these scores is + 5. 

Now the maximum score obtained if the numbers are all in the objective order 1, 2, 
... 10, is 45. We therefore define the rank correlation coefficient x as the ratio of the 
actual score to the maximum score, i.e., in the present case, 

- = i = 0.111. 

as compared with p — 0*139 for the Spearman coefficient. 

Generally, if there are n individuals the maximum score, obtained if and only if they 



RANK CORRELAHON 


Me in thb ordef (1, 2 . . . ia {» - 1) + (» - 2) + . . . + 1 = g. —• Denoting 

the actual score by 8, we have then for the coefficient of rank correlation 


2 ^ 

»(» — 1 ) 


. (10.9) 


16 . 7 . The actcubl calculation of 8 may be shortened considerably. Looking again 
at the ranking (16.8) we see that the number 1 has two numbers on its right and seven 
on its left. We therefore score 2 — 7 = — 6 and strike out the 1. In the remaining 
ranking, the number 2 has 6 numbers on its right and two on its left, and hence we score 
6 — 2 ss 4* 4; we then strike out the 2 and proceed with the 3 as before. It wiU be foimd 
that the scores obtained are 

“I" "I* 1, 4" 0» ~ 3, 0, 4“ 3> 0, ~ 1* 

The total of these scores is 4- 6, and is equal to 8. The rule is quite general. Its validity 
is evident from the consideration that instead of taking each number with its succeeding 
numbers we consider pairs contributing to iS in a different way. Taking the number 1 
first, and remembering that aU other numbers are greater than 1, we see that any number 
on the left must contribute — 1, and any number on the right + 1, to /S. When 1 is struck 
out the procedure remains valid for 2, and so on. 

Alternatively the following procedure may be adopted. Considering again (16.8), we 
see that the number 4 has on its right 6 greater numbers, the 7 has 3 greater numbers, 
and so on, the numbers being 

6, 3, 6, 0, 4, 2, 1, 2, 1. 

totalling 26. There must therefore be 45 — 25 = 20 numbers lying to the right of successive 
numbers in the ranking which are less than those numbers, and hence 8 = 25 — 20 = 6 as 
before. Generally, if the number obtained by counting greater numbers is k, 

S = 2k-'^^ 

2 

and thus r — -rr — 1.(16.10) 

n{n — 1) 

A check may be obtained by counting greater numbers lying to the loft. If the total 
of such numbers is I 

-21 

2 

. 


16.8. The extension of the use of t to the case where no objective order is given 
requires a little further consideration. Suppose we have two rankings as follows:_ 

Ai At At At At At A^ At At Aio) 

P 69435 10 21871. . (10.12) 

Q 66 10 2397418J 


I 



AN ALTEBNATIVB CX>EFPICIENT OF RANK CORRELATION 893 

t Haiky be obtained by arranging one ranking in the natural order ( 1 , 2 ... n) thus: 

A 7 ^4 A 3 As As Ass As Af As^ 

F 123466 7 89 10 V. , (16.13) 

Q' 472 10 36 8 159J 

and then finding t between P' and Q' as in the preceding section. We have however to 
show that if we arrange Q in the natural order, giving 

As As As As As As A 7 Ass As As"! 

P" 8 3 6 1 9 6 2 7 10 4 S , . (16.14) 

Q" 1 2 3 4 6 6 7 8 9 10 J 

then T between F' and Q” is the same as that between P' and Q'. That this must be so 
may be seen as follows:— 

In (16.13) the successive contributions to S are, as found by the method o^ 16.6, 

+ 3, — 2, +5, — 6, +3, 0, 1, +2, + !• 

Consider now the contributions to S from (16.14) when the short method of 16.7 is used. They 
will be found to be exactly the same. If the permutation Q' begins with a# the contribution to 
Sq- from pairs involving Os will be (« — o*) — (Oo — 1). In P" the Osth number will be 1 and 
the contribution to Sp- will also be (n ~ Oo) — (Us — !)• If the second number in <?' is 
o, the contribution to Sq- will be (» — Ui) — (a, — 1 ) ± 1 according to whether Os is 
greater than Os or not. In P" the Oith nupiber will be 2 and the contribution to Sp-, is 
also (n — Oi) — (oi — 1) ± 1 according to whether 1 lies on the left or the right of 2 in 
P", i.e. whether Oi is greater than Os or not; and so on. 

In practical calculations it is not necessaiy to carry out the rearrangements. Consider 
again (16.12). The number 1 in ^ has an 8 above it in P. In the ranking of the A’s 8 
has two members to the right and seven to the loft. Score therefore, — 5, and strike 
out A„. The number 2 in Q has a 3 above it in P, and A, has six members to its right (ignoring 
As) and two to its left, score + 4; and so on, the scores being 

— 6 , -f- 4, "f" 1, 6 , — 3, 0, “ 1 ” 3, 0 , —“ 1 

totalling + 6 which is equal to 8. 

16.9. Like p, r h +1 only if the correspondence between two rankings is perfect 
and — 1 only if the rankings are inverted. In actual practice the values given by the 
two coefficients bear a nearly constant ratio (cf. 16.24) and one appears to be as good as 
the other so far as providing a measure of ranking concordance is concerned, p is, how¬ 
ever, easier to calculate and is probably the most convenient to use. Against this must be 
set certain difficulties in its sampling distribution, which will be referred to below, and the 
fact that T can be gen^alised to the case of partial rafik correlations. 

16.10. In considering the interpretation of any particular value of p or r the question 
naturally arises, are such values significant in the statistical sense, i.e. can they have arisen 
by chance from a population in which the qualities under consideration are independent ? 
.^d further, can we assign a standard error to the observed values ? The second question 
is not an easy one to answer, or even to understand pnless ranks are related to variate- 
values. In the sampling of variates we are given a sot of n values emanating from a popu¬ 
lation of values. In the ranking case we are given n ordinal numbers, but it is useless 



394 


BANK CX)EEELATION 


to oonsider thorn as emanating from a population of (different) ordinal numbers. 
point will be considered later when we introduce the concept of grades (16.25). 

The sampling problem, however, acquires a definite meaning if the two qualities under 
ocHisideration axe independent. In such a case the pairs of rankings of n members drawn 
at random are independent; and consequently in a large number of samples there will 
ooottr in equal amounts every ranking according to one quality associated with every 
ranking according to the other. We are thus led to consider the distributions of p and x 
* in populations consisting of all possible associations of all possible rankings. Clearly no 
^^enerality is lost if we fix one ranking as the order (1, 2 . . . ») and consider its correlations 
with tihe »1 possible permutations of those numbers. If a given p or t caimot, to an accept¬ 
able degree of probability, have arisen from such a population, we are justified in concluding 
that the two qualities have some definite relationship in the population. 

Sampling Dtstribution of Spearman's p in the Cass of Independence 

16.11. Consider then the distribution of values of p in the population obtained by 
correlating the order (1, 2 ... n) with every possible permutation of the n natural numbers. 
We shall, in fact, find it more convenient to consider the distribution of 2'(d*), which is 
simply related to p by equation (16.5). Certain elementary properties of the distribution 
are obtainable imme^ately. 

(o) Any value of must be even; for 27(d) = 0 and hence the number of odd 
values of d, and thus of d*, is even. 

(6) The possible values of i7(d*) range from 0 to \(n^ — n) and hence there are 
i(»* — n) ■+■ 1 of them. 

(c) The distribution is symmetrical, about a central value if |(n* — n) is even, or 
about two adjacent central values if it is odd. This follows from the fact that to any value 
of p corresponding to a permutation P there will correspond a negative value of p, of the 
same absolute value, arising from P inverted. For if P is Xj . . . X„, the inverted 

n 

permutation is X„, X„_i, . . , Xj. X(d*) calculated from P is then ^ (X^ — i)* and 


that from p inverted is ^(X,- — -f 1 -}- »)*. The sum of these two is 

X(X,*) -f X(i*) -2Z{XS + X(X,*) -h I{n 1 ~ i)* - 2X |X<(n + 1 - i)}. 

The first, second, fourth and fifth terms in this expression are equal to i.e. to ^n(n -f 1) 
(2n + 1). The sum of the third and sixth is 

- 2(n -J- 1) X(X) = - K(n + 1)*. 

Thus the sum of the two X(d®) is 

|n(n -j- l)(2n -f 1) — n{n -f- 1)* ^ 

= - »)• 

Thus we see from (16.5) that the sum of the corresponding p’s is zero. 

(d) It follows that all odd moments of the distribution of 27(d*) about the mean vanish. 

16.12. Consider the deviations between the order 1, 2, . . n and an order X. If 
one deviation is known, then certain deviations become impossible for other ranks. For 
instance, if the deviation d, between X, and 1 is (» — 1), then Xi = n, and it is impossible 



DlSTBEBimON OE SPEARMAN’S p 395 

for the deviation between X, and 2 to be (n — 2); or for the deviation between Xt and 3 
to be ~ 3), and «o on. Consider thm the array: 

» — In—2» — 3... 2 1 0 

n — 2n — 3n—4... 1 0 —1 

» — 8»—4n—6... 0 —1 —2 


2 

1 

0 


1 0 . . 

0 - 1 . . 

- 1 - 2 , . 


• •• ••• ••• 

— (w — 5) — (n — 4) — (n — 3) 

— (n -'4) - (n - 3) - (n - 2) 

— (» — 3) — (» — 2) — (n — 1) 


If d* has the value in the rth row and the fcth column, then d; cannot have the value in the 
rth row and the 2th column; and so on. 

In fact, any permissible set of deviations is given by taking n entries &om the above 
table so that no row or column contributes more than one entry. 

Hence to get X{d*) for any permissible set, write 

a® a* a* .. . a<“~ 

a* o® c* a* .. . a<"~ 
a* o® 

(j(n-l)* ^(n-3)- ‘ 'q 

and 2'(d*) is given by the index of o of one of the terms obtained from E by choosing n 
factors so that no row or column appears more than once and multipljdng them together. 
Thus the distribution of 2’(d*) is given by the totality of n\ terms which can be constructed 
in that way. E will be taken to be equal to the polynomial in a given by the sum of these 
terras—the so-called “ permanent.” 




16.13. E bears an obvious analogy to the determinant, but it cannot be regarded 
as such and expanded accordingly. If it could, the distribution of E{d^) would be obtained 
without difficulty, for a determinant with the elements of as given above may be shown 
to be equal to 

(1 - {1 - a «)”-2 (1 - a ®)’*-* ... (1 - 

E, in fact, lacks the fundamental property of the determinant in that it does not change 
sign if two rows or columns are interchanged. 

Nevertheless certain of the rules of determinantal algebra remain true for E. The 
most valuable is that E may be expanded in terms of its minors of any order in the usual 
way. Expansion of this type is, in fact, rather easier with E than with the determinant, 
for all terms of E are essentially positive and there are no difficulties with signs. Such 
expansions were used in obtaining the distributions given below. There are also certain 
devices which assist the expansioii oi E in virtue of its symmetry. Two which will be found 
useful are as follows :— 

(a) Any minor of E is symmetrical in powers of a, i.e. is of the form 

Aou* + Aju*-* + A«o*'-* + . . . -f-A«a’"-* H- AjO”*-* -f Ao®”. 

(b) The effect of shifting a minor bodily across is to multiply eadb term of its 
expansion by a constant power of a. 




TABLE 18.1 

Speamum*t p. JHebribvcUm of for Values of n from 1 to 8* 


Values of n. 



1 

2 

3 

4 

6 

6 

7 

8 

0 

1 

1 

1 

1 

1 

1 

1 

1 

2 


1 

2 

3 

4 

6 

6 

7 

4 



0 

1 

3 

6 

10 

16 

6 



2 

4 

6 

9 

14 

22 

8 

« 


1 

2 

7 

16 

29 

47 

10 




2 

6 

12 

26 

54 

12 




2 

4 

14 

36 

70 

14 




4 

10 

24 

46 

94 

16 




1 

6 

20 

66 

129 

18 




3 

10 

21 

64 

124 

20 




1 

6 

23 

74 

178 

22 





10 

28 

70 

183 

24 





6 

24 

84 

237 

26 





10 

34 

90 

238 

28 

• 



• 

4 

20 

78 

276 

30 





6 

32 

90 

264 

32 





7 

42 

129 

379 

34 





6 

29 

10« 

349 

36 





3 

29 

123 

380 

38 





4 

42 

134 

400 

40 





1 

32 

147 

617 

42 






20 

98 

394 

44 






34 

168 

542 

46 






24 

130 

492 

48 






28 

176 

640 

60 






23 

144 

667 

62 






21 

168 

666 

64 






20 

144 

696 

66 






24 

184 

776 







(ni(Klian) 

68 






14 

• 

684 

60 






12 


786 

62 






16 


718 

64 






9 


922 

66 






6 


746 

68 






6 


917 

70 






1 


781 

72 






, 


982 

74 






, 


826 

76 






, 


960 

78 






• 


844 

80 






. 


1066 

82 






, 


846 

84 








936 








(median) 

TotaiiS 

1 

2 

6 

24 

120 

720 

6040* 40,320* 


♦ Total of whole distribution, only the median value and the valuee 
on one aide of the median being shown in this table. 

3d6 









TASLiE 16.2 


8ftarmtm'a p. ProbeAiUtg OiatS{i*) unU be Attained or Exceeded for VaVuea of n from 4 to 8 


inclusive. 

£(«<»). 



0. 

JJ 

4 

0 

8 

10 

12 

14 

16 

18 

20 

22 

24 

26 

28 

n 4 

1 

0*968 

0*833 

0*792 

0*626 

0*642 

0-458 

0*376 

0*208 

0*167 

0*042 





tik «» 5 

1 

0*992 

0*968 

0*933 

0*883 

0*826 

■ ■<* 

0*776 

0*742 

0*668 

0*608 

0*626 

0*476 

0*892 

0*342 

0*258 

im 

1 

0*999 

0*992 

0*983 

0*971 

0*949 

0*932 

0*912 

0*879 

0*861 

0*822 

0*790 

0*751 

0*718 

0*671 


1 

1*000 

0*999 

0*997 

0*994 

0*988 

0*983 

. 

0*976 

0*967 

0*966 

0*946 

0*931 

0*917 

0*900 

0*882 

n 8 

1 

1*000 

1*000 

0*999 

0*999 

0*998 

0*996 

0*996 

0*992 

0*989 

0*986 

0*982 

0*977 

0*971 

0*966 


80 

32 

34 

36 

38 

40 

42 

44 

46 

48 

60 

62 

64 

56 

68 

n - 5 

0-225 

0*175 

0*117 

0*067 

0*042 

00*83 










n 6 

0-643 

0-599 

0*540 

0*600 

0*460 

0*401 

0-357 

0*329 

0*282 

0*249 

0-210 

0*178 

0*149 

0-121 

0*088 

n »« 7 

0-867 

0*849 

0*823 

0*802 

0*778 

0*761 

0*722 

0*703 

0*669 

0*643 

0*609 

0*680 

0*647 

0*518 

0*482 

n » 8 

0*958 

0-952 

0*943 

0*934 

0*924 

0*916 

0*902 

0*892 

0-878 

0*860 

0*860 

0*837 

0*820 I 

0*805 

0*786 

I _ 


1 60 

62 

64 

66 

68 

70 

72 

74 

76 

78 

80 

82 

84 j 

1 

86 

88 

n 6 

0*068 

0-061 

0*029 

0*017 

1 

0*0*83; 

0*0*14 










n an 7 

0-453 

1 

0*420 

0*391 1 

0*367 

0*331 

0*297 

0*278 

0*249 

0*222 

0*198 

0*177 

0*161 

0*133 

0*118 

0*100 

n a= 8 

0*769 

L _ 

0*760 

0*732 ! 

0*709 

0*690 

0*668 


0*624 

0*603 

0*680 

0*659 


0*612 


0*467 



92 

94 1 

96 

88 


102 

104 

106 

108 


112 

114 

116 

118 

n « 7 


0*069 

0*066 

0*044 1 

0*033 

1 



0*012 


00»34 

00*14 





w -» 8 


0*420 

0-397 

0*376 

0*362 

0*332 


0*291 

__ 1 

0*268 

0*260 

0*231 

0*214 



0*163 


120 

122 

124 

126 


130 

132 

134 

136 

138 

140 

142 

144 

146 


n « 8 

0*160 

0*134 

0*122 

0*108 

0*098 

0*086 

0*076 

0*066 


0*048 

0*042 

0*036 

0*029 

0*028 

0*018 



160 

152 

164 

166 

168 

160 

162 

104 

166 

168 i 

n - 8 

0*014 

0*011 

0*0*77 

0*0*64 

00*30 

0*0*23 

0*0*11 

0*0‘»67 

0*0*20 

0*0*26 


397 


































808 



RANK CORRELATION 

e.g. tile nunors 







‘ a® 

a* 

a®] 



Jf 


<1® 

aA 

a® 4- 2o* + 2a* + o* 



a® 


a®J 



1 

F a* 

a® 

a“ 


and 

M’ = 


a* 

a® 

• = o»»(a® + 2o* + 2a* + a«) 

are related by 


[ a® 

a* 

a* 

■ 

M' = Ma^K 





16«14. The tables on pp. 31^6-7 show the frequencies of for values of w from 1 to 
8 inclusive and the probabilities that a given value of £{d^) will be attained or exceeded 
on random sampling for n from 4 to 8 inclusive* 

16.15* The distributions of Table 16.1 are peculiar in several respects. For lower 
values of n they are distinctly bimodal. For w = 7 and w = 8 the frequency polygons have 
an unusual serrated profile, that for the latter being shown in Fig. 16.1, though normality 



Fio. 16.1. Spearman’s p. Frequency Polygon of £{d^) for n « 8. 

Is begroning to emerge. It will be shown below that as w->oo the distribution tends to 
normality, but it is not immediately obvious how a serrated polygon of this kind can do so. 



DISTRIBUTION OF SPEARMAN’S /> m 

I think that the taile of the onrve smooth out first, and that as n increases the emopthness 
runs up the onrre towards the apex. 

16.16. The oaloulation of frequencies for n greater than 8 would be a tedious process 
and can be obviated by finding curves which satisfactorily approximate to the distribution, 
at least so far as its distribution function is concerned. For this purpose we will find the 
second and fourth moments of p about its mean. The first and third, of course, are zero. 

Suppose we measure the rank numbers from their mean, writing for the new variables 
« Z — J(n + 1), y = F — + 1). Then ftom 16.4 we have 


where N = — ”1 Since E{p) — fi\{p) — 0 we have 

varp = 

^ ~E{E(xY)) + yaj)} 

where » j. Now for any value of x, y may have any value from 1 to n. Hence 

EE{x^*) — nE{x*)E{y*) 


(16.15) 


= —.(16.16) 

n 

Further, in the product term of (16.16) there are n{n — 1) pairs of values i 9 ^ j and thus 

EE^x^x^ y^^ — n{n - 1) E(x^x^ y^y^) 

= n(» — 1) E{XiXf)^ 


n(n — 1) 


(EXiXj)* 


_ N* 
n(n - 1) ■ 

Hence, substituting from (16.16) and (16,17) in (16.15) we have 

1 . 1 

varp = - + - -- 

n n(n — 1) 


By the same technique it may be shown that 

, , _ 3(25m» — 38w» - 36» + 72) 
26»(n + l)(n - 1)» 


(16.17) 


(16.18) 


. (16.19) 



m 


RANK OORRKLATION 


16.17* Consuier now t^e Tytw II symmetrio dlslTributum 

, dF^— - - — ^(l-x*frdie, ~l<x<l . ,( 16 . 20 ) 


The first and third moments are, of course, zero. The second and fourth are given by 



. (16.21) 


. (16.22) 


The distribution thus has its first three moments the same as those of Spearman’s p in the 
ease of independence. The fourth moments are the same to order n"*, the difference being 


3 f. _ 25n» — 38n« — 35 n + 72 ] ^ - 36 
n* — 1 ( 26»(« — 1)* J ^ »* 


i.e. of lower order in n than the moments themselves. It has therefore been suggested 
that the distribution (16.20) may be used instead of that of p to give the distribution function 
of the latter for moderate or large n. Tests on the distributions of Table 16.1 indicate 
that this is a justifiable approximation. 

For instance, when n = 8 the distribution (16.20) becomes 


dF — — — (1 — X*)* dx 
B{h 3)' ^ 

and by direct integration the probability of obtaining a value of x greater than x* in absolute 
value is 


15 

8 


e—"‘ h-d- 


. (16.23) 


In comparing this with the values of the p-distribution it is as well to make a continuity 
correction, similar to that of 12.15, to allow for the fact that the distribution of p is dis- 
continaous whereas that of x is continuous. If the values of S(d^) are regarded as spread 
over a range of one unit on each side of the actual value, the range of 2’(d®) is increased from 
J(n* — «) to ^{n* — ») + 2, each terminal contributing a unit. Instead of writing x = p 
we will then write 


1 


Z(d^ 


(16.24) 


-«) + !• 

Now from Table 16.2 the probability of obtaining a value of p greater than i in absolute 
value, corresponding toZ'{d*) outside the range 14 to 154 inclusive, is 2 x 0-0053 = 0-0106. 

14 

The appropriate x from (16.24) is 1 — ~ = 0-835, and this on substitution in (16.23) gives 


the probability of 0-0098. Similarly the chance of getting a value of Z{d*) outside the range 
^6 to 142 inclusive is 0-0576. That given by (16.23) is 0-0561. The agreement is evidently 
good enough for most practical purposes and would, of course, improve as n increases. 



DISTRIBUHON OF SPEARMAN’S p 
16.18. If we piit, in (16.20), 


401 




we obtain the distribution 
dF ^ 


dt 


2)* B(\, in - 1) ^ t* ^ 


t* X**:::!' ' 


. (16.26) 


the Student ” distribution of Example 10.6. If n is large the continuity correction may 
be neglected and to this approximation 

z —p, 

so that p may be tested in “ Student’s ” distribution by writing 


t — p\ 


/n — 2Y 


, (16.26) 


Example 16.3 

In Example 16.2 we found a value of p = 0139. Is this significant ? 

We have w = 10 and from (16.26) 

t == 0-139 /- - - 

V 1 - (0-139)* 

= 0-397. 

From Appendix Table 3 we see that the chance of getting such a value or greater in 
absolute value is about U-70 (= 2(1 — 0-65)). The value cannot therefore be regarded as 
significant. 


16,19. As n tends to infinity the JS-distribution tends to the normal form and we 
therefore suspect that p also tends to normality. That this is in fact so may be seen as 
follows ; the proof being due to Hotelling and Pabst (1936). 

The general moment of p of even order is given by 




. (16.27) 


n n 

where 82 is written for ^ and generally 8 ^ for ^ r/. When the parenthesis is expanded 


i«.i 




we may, in virtue of the independence of x and y, take expectations term b;^ term, regarding 
the x’s as constant. Now 

E(y,*") = F(x,*») = li:(y>) = 


Vi) = 

DD 


A..8,—VOL. I. 



RANK CORRELATION 


4<M! ' 

Henoe 

’‘‘!’+ n(n - IX- + '*”•} 

where the ooefBioieuta A depend on a but not on n. We proceed to show thfitt the term 
of greatest degree in n in (16*28) is the term . Xp^}* 

The numerator of any term in (16.28), being a symmetric function of the x% can be 
expreesad in terms of the symmetric sums 5^. Further 8p vanishes if p is odd. Since 
any is of degree i + 1 in n, the degree of a non-vanishing term 8^^ S'*. • * • is 

4- 1) == 2a + p. Consequently the term of highest degree in n must contain os 

high a p as possible, that is to say as many 8*& as possible, subject to the requirement that 
the subscript of each S must be even. 

Now consider a term 

^ ^ ^o^ax^oit • • • ^ai+a, ^a, * • • ^otp 

4 ^ai+a,+a, ^a* • ' * ^ocp + • * •> • • (16.29) 

If the a’s are all even the term of highest degree on the right is, as just remarked, 
2a 4* jp* If the a’s are not all even, suppose there are m even ones and 2q odd ones 
(tn 4- = p). Then the first term in (16.29) vanishes and the term of highest degree 
which does not vanish must be obtained by grouping q pairs of odd a’s, and hence is of 
degree 2a 4- ^ = 2a + p — 

Now in (16.28) the degree of the denominator in each term is the number of different 
x*& in the numerator. Thus the term of highest degree in x is of degree 

2(2a 4- p — 5 ^) — (m 4* 2g) = 4a — m + 2p — 4^ 

= 4a 4- w^. 


This will be a maximum when m is a maximum and therefore when q is zero, in which case 
f» =« a. Hence the greatest degree in in (16.28) arises from the term 
as stated. Now in the expansion of 

4- . . . 

the coefficient of arj ... yf .. . is, by the multinomial theorem, and hence 

2* 


f^2a 


1 (2a)! {E{x\ . . . x/))» 

Six 2 « ■ 


. (16.30) 


The term of highest degree in n in S(x\ . . . x^) is that in /S,®, the coefficient of which is 
evidently the reciprocal of that of E(x'{ . . . xj‘) in 

s/ = (xf + ... xjr, 

= a! 


M2a 


2«.a! 



i.e. 

Thus, from (16.30), 



DISTRIBUTION OF t 


403 


N<m ftt es —and thus 
— J , 

• • ■• • • •<'*•**> 

/i2 2«a! 

i.e* to the momentB of the normal distribution of unit variance. It follows from the Second 
Limit Theorem of 4.24 that the distribution of p tends to normality. The tendency is not, 
however, very rapid and we have already noticed the peculiar character of the distribution 
for lower n. 


Distribution of r in the Case of Independence 

16.20. We now consider the distribution of the coefficient r under similar conditions, 
that is to say in a population obtained by correlating a given ranking with all the nl possible 
rankings. 

Consider a given ranking of the numbers 1, 2, ... n and the effect of inserting an 
additional number (n + 1) in the various possible places in the ranking, from the first place 
(preceding the first number) to the last place (following the last number). 

Inserting a number at the beginning will add — n to the value of S of equation (16.9). 
Inserting it between the first and second will add — (n — 2) to 8 ; and so on. Thus to 
any frequency-distribution of 8 for given n, say f{S, n), there will correspond frequencies 
f(8 — n, n), f{S — (/I — 2), n) . . . f(S + n, n), the sum of which gives f{8, n + 1). If the 
frequency of a given 8 is the coefficient of in a polynomial P{x), then the corresponding 
values of in the frequency for (n + 1) are the coefficients of 

(xr^ + + . . . + x^)P(x). 

But the frequency-distribution of 8 when n = 2 is given by +■ there being one 
value 8^ — 1 and one value 8^1. Thus the frequencies of 8 for rankings of n are the 
coefficients of in the array 

f rr: + x)(x“^ + 1 + x^){x'^^ + x~^^ + x^ + x^) . . . 

+ . . . + . . . (16.32) 


It follows that the distribution of 8, and hence that of r, is symmetrical about zero. 
The values of 8 are either all odd or all even, according to whether is odd or even. 

The actual frequencies may be calculated by a figurate triangle, as follows:— 


Value of n 
1 
2 

3 

4 

5 


Frequencies of 8 '' 

1 

1 1 

12 2 1 

1 3 6 6 5 3 1 

1 4 9 16 20 22 20 16 9 4 1 , 


(16.33) 


In tliis array a number in the rth row is the sum of the number above it and the (r — 1) 
numbers to the left of that number. A little reflection will show that this rule follows 
from (16.32). The formation of the array is quite simple and several devices shorten the 
arithmetic. For instance, in part of the array towards the left a number in the rth row is 
the sum of the number immediately above it and the number immediately to the left. 
The array is symmetrical and the total in the rth row is rl 



404 ^ 


RANK CORRELATION 


The following tables show the frequency-distribution of 8 for values of n from 1 to 10 
molusive and the probability that a value of 8 will be attained or exceeded. 

TABLE 16.3 

Bank Coefficient tj Distribution of 8 for Values of n from 1 to 10 {only the Positive Half 

of the 8ymmetrical Distribution shoum). 




Values of n 




Values of n 


s 





S 





1 

4 

5 

8 

9 


2 

3 

6 

7 

10 

0 

1 

6 

22 

3,836 

29,228 

1 

1 

2 

101 

573 

260,749 

2 


5 

20 

3,736 

28,676 

3 


1 

90 

631 

243,694 

4 


3 

16 

3,450 

27,073 

6 



71 

455 

230,131 

« 


1 

9 

3,017 

24,684 

7 



49 

359 

211,089 

8 



4 

2,493 

21,450 

9 



29 

259 

187,959 

10 



1 

1,940 

17,957 

11 



14 

169 

162,337 

12 




1,415 

14,395 

13 



5 

98 

136,853 

14 




961 

11,021 

15 



1 

49 

110,010 

16 




602 

8,031 

17 




20 

86,054 

IB 




343 

6,646 

19 




6 

64,889 

20 




174 

3,606 

23 




1 

47,043 

22 




76 

2,191 

23 





32,683 

24 




27 

1,230 

26 





21,670 

26 




7 

628 

27 

I 




13,640 

28 




1 

285 

29 



j 


8,096 

30 





111 

31 




1 

4,489 

32 

1 




36 

33 1 

I 



2,298 

34 





8 

35 1 




1,008 

36 





1 

37 1 




440 







39 ! 




166 


[ 





41 ' 




44 







43 1 


1 


9 






t 

1 

45 1 

) 


1 

1 


1 


16.21. As may be seen by comparing Tables 16.1 and 16.3, the distribution of 8, 
and hence that of t, is much smoother than that of D(>D) and p. We show below that it 
tends to normality, and in fact the tendency is so rapid that for values of n greater than 
10 the normal distribution provides an adequate approximation. We proceed to find the 
second and fourth moment of the distribution. 

If we differentiate the expression / in (16.32) and equate a: to 1 we evidently obtain 

the first moment of 8 ; and generally, writing 0 for the operator x^-, 

OX 

nlfi^ = .( 16 . 34 ) 

For example, when r = 1 we have 

n! /<» = (- 1 + 1)(1 + 1 4- 1) . . . (1 + I + . . . + 1) 

+ (1 + l)(-~ 2 -j- 2)( . . . ) 

4- etc. 

= 0. 



DISTRIBUTION OF t 
TABLE 16.4 


406 




Probahility that 8 attains or exceeds a Specified Value. (Shmm only for Positive Values. 

Negative Values obtainable by Symmetry.) 


s 

Values of n 

4Sf 

Values of n 

4 

6 

8 

9 

6 

7 

10 

0 

0-625 

0-592 

0-648 

0-640 

1 

0*600 

0-600 

0-600 

2 

0-376 

0-408 

0-462 

0-460 

3 

0-360 

0-386 

0*431 

4 

0-167 

0-242 

0-360 

0-381 

6 

0-235 

0-281 

0 364 

6 

0042 

0-117 

0-274 

0306 

7 

0 136 

0-191 

0-300 

8 


0042 

0-199 

0-238 

9 

0-068 

0119 

0-242 

10 


0-0*83 

0-138 

0-179 

11 

0-028 

0-068 

0-190 

12 



0-089 

0-130 

13 

0-0*83 

0*036 

0-146 

14 



0-064 

0-090 

15 

0-0*14 

0-015 

0-108 

16 



0-031 

0-060 

17 


0*0»54 

0-078 

18 



0-016 

0-038 

19 


0-0*14 

O-064 

20 



0-0*71 

0-022 

21 


0-0*20 

0-036 

22 



0-0*28 

0-012 

23 



0-023 

24 



0-0*87 

0-0*63 

25 



0-0*14 

26 



0-0*19 

0-0*29 

27 



0-0*83 

28 



0-0*26 

0-0*12 

29 



0-0*46 

30 




0-0*43 

31 



0-0*23 

32 



i 

0-0*12 

33 



0-0*11 

34 




0-0*26 

35 



0-0*47 

36 




0-0*28 

37 



0-0*18 






39 



0-0*68 






41 



0-0*16 




1 

1 


43 



0-0*28 






45 



0-0*28 


When r •== 2 the operation on / will result in two types of terms, those in wliioh both 
operations operate on one factor of / and those in which the operations operate on separate 
factors. When x = 1 these last vanish and thus 


n!= (1 + i)”y + (2* + 2 ^)^ + 

2t o 4 


+ . 


(At - 1* + »t - 3* + . - . 4- )t - 3* + w - l*)w! 


n 


= ^.l* + ^.2» + j(P + 3*)+ . 


+ \n — 12 + w 32 + 
n 


This may be summed by the ordinary methods of elementary algebra, and we find 

_ n(n — l)(2n -f 5) 

.... 

In a like manner it appears that 

- 2) + H(n - 2){n - 3) + - 2)(n - 3)(n - 4) 


.) 

. (16.35) 


+ - 2)(» - 3)(» - 4)(n - 5)| 


. (16.36) 




406 


RANK CORBBZATXON 


16.22. To prove that the dietribution of t tends to normality as » --► oo wo shall 
show that 

( 2 «)!/ 


/*2.- 


2“a! 


(f^*r 


Consider the effect of operating on/ in (16.32) by $ 2a times and then putting a: = 1. There 
iiHll appear terms like 

fr*“ + (r ~ 2)*« + . . . + (r - 2)*“ + r*» 


nl-i 


-}■ 




. . . +{r - 2)8.-i f- < - (f - 2) + » ♦ . (< -f 2) + 






etc. Any term with an odd superscript vanishes. Consider now the sum of terms like 

fr » + . . , + + • • • +<*1 


*'{■ 


-n- 


t 


+ - • . ± . “* 1 . ( 16 . 37 ) 


It will be shown below that this term contributes the greatest power of » to the sum 
giving nl 

In virtue of the multinomial form of Leibniz’ theorem on the differentiation of a product, 
the factor by which this term is multiplied in the expansion of 6*“/ is 


(2a)! _ (2a)! 

2! . . . 2! 2* 

Hence {Sum of terms like (16.37)} . . . (16.38) 

A® 

I . . 

Each of these terms is of type — (r® + (r — 2)^ + • . . (r — 2)* + r^) i.o. is of order —. 

f o 

The sum will then tend to the sum of tenns like ^-(P.22 . • . a*), each term containing 

a squares of the numbers 1, 2 ... n — 1. Call this tt,. 

Then is —, times the sum of terms in 
al 

i{l* + 2* + . . . . . . .(16.39) 

3* 


which contain a different factors. 
Now (16.39) is of order — ~ fi^. 


Hence if tends to equality with the sum (16.39) 




a! 


and hence, from (16.38) 




(2«)! W 

2“ a! 


We have then to show that (16.38) tends asymptotically to the sum of its terms a! i.e. 
that sums of terms like 

1*.2* ... (a - 1)*, 1«.2* ... (a - 2)* 


tend in comparison to zero. This may be shown inductively. Consider first of aU 
(1* + 2* + . . . (n - 1)*}* = 27t, + 1« + 2« 4- ...(»- 1)«. 



DISTRIBUTION OF t 


407 


The expresaion on the left -g-. But the sum of fourth powers on the right ~ 

IF 

is of lower order. Hence the sum on the right ~ 2jtt. We then have 
{1* + 2* -f . . . - !)*}» 1)*} 

~ 671, + terms of type 1*. 2®. 

These terms will be less in sum than 

2{1* + 2* + 2« 4- . . . (w - 1)*} 


which 


■ 2 . 


«* ra* 
8*5^’ 


of degree 8. 


But the expression on the left is of degree 9. 


w 


, which 


Hence 


{1* + 2* 4- . . , (n — 1)®}® ~67r„ and “Bo on. 


We can now justify the assertion that the maximum power of n arises from terms like 
(1®.2* . . . a*). In fact, by a similar line of reasoning to that just given it will appear 
that sums of terms like (1*.2® . . . (a ~ 1)®) are of lower degree in n. This completes 
the demonstration. 


16.23. In using the normal distribution to approximate to the /^-distribution it is 
desirable to make a correction for continuity by subtracting unity (half the interval) from 
8 in order to obtain the probability that a given value will be attained or exceeded. For 
instance, when » = 9 we have from (16.36) 


var 8 


9.8.23 

“T'8~ 


= 92. 


The normal deviate corresponding to S = 20 is then = 1*981. The probability 

of a normal deviate as great as or greater than this is 0*0238. The value from Table 16.4 
is 0*022. Had we made no correction for continuity we should have found a normal deviate 
of 2*085 with a probability of 0*0185. 


Example 16.4 

In 16.6 we found for a certain ranking of 10, r = 0*111, 5 = 6. The Spearman 
coefficient for the same ranking, 0*139, has already been seen to be non-significant. What 
conclusion should we reach about r on this point? 

From Table 16.4 it is seen that the probability of a deviation greater than or equal to 
5 is 0*364, and that of a deviation greater than or equal to 5 in absolute value is then 
0*73 approximately. The corresponding value for p is 0*70. In either case the coefficient 
could well have arisen from an “ independent ” population and is not significant. 


16.24. Different as p and x are in conception and method of calculation, they are very 
closely related. Cogent reasons (but not a rigorous proof) have been given for belief in the 
validity of the equation (for the population in which all rankings occur equally frequently)— 

cov {8, Eid^)) = _ 17i(» + l)®(« - 1) 

from which the product-moment correlation between p and t is 

2(n -b 1) _ 1 _ 1 . 

y/{'2n[2n *4 6)} 4» ' 


. (16.40) 



408 ^ 


RANK CORRELATION 


(Kendall and others, 1938). For values of n occurring in practice the correlation between 
p and T is thus very high. It also appears that the regression of /> on r is approximately 
linear over the material part of the range, that is, unless both are very close to unity. In 
such a cane, recalling the values of the variances of the two coefficients, we shall have 


\/n — I 


18n(n — 1) 
M2n + 5) 


3t 

- 

2 ’ 

80 that T will be about two-thirds of the value of p when n is large. 


Chradea 

16.25. Up to this point we have considered the problem of rank correlation without 
reference to any variate system which might underlie the rankings. In certain classes of 
inquiry this is inevitable ; for example, we might shuffle a pack of cards and use the rank 
correlation between the orders before and after shuffling to measure the efficacy of the 
process of mixing. The early theory of rank correlation was, however, developed from 
rather a diflFerent view-point. The qualities considered were measurable, and always in 
theory (and often in practice) it was possible to find a product-moment coefficient of 
correlation. The use of Spearman’s p was regarded as a substitute for such a coefficient, 
suitable either because the necessary measurements could not be carried out, whereas the 
ranking could, or because time was saved in working out rank correlations. 

It is not immediately evident what meaning can be attached to ranking in a continuous 
population, for the members thereof are not denumerable. 

The remark of 16.5 offers one way of overcoming the difficulty. The ranking of an 
individual as r can be regarded as a numerical statement to the effect that there are (r — 1) 
members ** above ” that individual, that is to say (r — 1) members who are given precedence. 
Quantities have already been considered in connection with continuous j)opulations which 
express the same idea, namely, the quantiles. The pth decile, for example, is the variate- 
value such that p tenths of the total frequency lie below it. We will then define the grade 
of an individual as the proportion of the total frequency with a lower variate-value tlian 
that borne by that individual. If wo have a discontinuous population N in number, the 
grade of an individual ranked according to the variate-values as r (from the lower to the 

higher values) will be . If the population is continuous its members cannot be 

ranked ; but if we choose a sample of n members and rank them, an estimate of the grade 
of the rth member may be obtained by assuming that one-half of that member is to be 
assigned to each of the ranges into which its variate-value divides the variate-range, so that 
its grade is then taken to be 

((r - 1) + ^ (r - 1) ^ 

n n 

16.26. For a continuous bivariate population there will be no rank correlation, but 
there will, in general, be a grade correlation. Consider the bivariate normal population 
whose frequency function is 

^ ° Ml 1 {- 20^) + «'•)} • •('»•«> 



GBADBS 


409 


where, to avoid confusion with Spearman’s p, we have denoted the produot^moment 
coefficient by p\ 

Let 




. (16.42) 


Then f and r] are the grades and if x and y are independent so are f and jy. f is a function 
of X and is distributed in the form 

dF(S) =^dS. 0 I .(16.43) 

and similarly for»?. Thus the mean and variance of both ^ and rj are | and respectively. 
For the Spearman coefficient between | and rj we may then take 




^tjzdxdy — I . 


. (16.44) 


remembering, however, that this is a generalisation of /» to grades. From (16.44) we then 
have 


== 12f 

dp J 


QO pCO 

—CO J —a 


f >7 — dx dy. 
dp 


Now 

Thus 


log z = — —-- (r* — 2p'xy 4- 2/*) — i log (1 — p'*) — constant. 

2(1 - p 2) 


~ == - (x® — 2p'xy + «*) 4- -1-^- 

/ (1 _ p'3)s^ ^ s' ^ ^ 1 _ p'a ^ 1 _ p'a 


Idz 

z dp' (1 - p'=>) 

= 1 

z dx dy 

dp 
dp 

By a partial integration with respect to x this is equal to 


and hence 


pOO ^00 02y 

espect to X this is equal to 


The first term vanishes and thus 

= - 12 f dy f rj 
dp J —oo J —00 

By a partial integration with respect to ^ we find 


dx dy 


dp 

dp 




dydS 
, dy dx 


z dx dy. 


whence, from (16.42) 


dp _12 

dp' 4n*(l 




6 

ji(4 - p'»)* 



410 


RANK CORRELATION 


Zxttegraiing we have, since p vanishes with />', 



or p' = 2 sin ~ . . . . . . (16.46) 

o 

! 6.27. This formula is due to K. Pearson, but its value is problematical. It represents 
the relationship between the product-moment and the grade correlations when the variates 
are normal. It has, however, been used to transform a rank correlation obtained from 
a small sample of n values into a putative product-moment coefficient in that sample, or 
even worse, in the population from which the sample is derived, whether normal or not. 
The reader may care to list for himself the assumptions made in adopting such a procedure 
and to reflect on their justification. We shall not notice the process again, but we may 

note that in no case is p very different from 2 sin ^ in numerical value. If p = 0*6, 

TtO 

2 sin = 0*618, and this is about the greatest diflFerenco that can occur. 

16.28, Equation (16.45) has also been advocated as an easy, though perhaps 
inaccurate, method of calculating a product-moment coeflEicient. The idea is that when 
a set of bivariate values is given they shall be replaced by ranks, the rank coefficient 
calculated, and the value of p' derived from (16.45). Apart from the theoretical objections, 
such a procedure involves no saving of labour if the number of values is greater than 30 or 40. 
Various formulae have been offered for the standard error of an estimate of the parent 
product-moment correlation based on (16.45). Some of those in current statistical text¬ 
books are incorrect, and it may be doubted whether the use of any one is justified. The 
reader may consult Eells (1929) for a list of these formulae. 


The Case of m Rankings 

16 •29. We now consider the more general case in which there are w rankings of n 
instead of two. Our problem is to discuss the general agreement among the sot of m. 

It is natural in the first instance to consider the average p or t in the possible 

pairs which can be chosen from the set of m. For example, if we have three rankings of 
six as follows ; — 


P 5 4 1 
g 2 3 1 
P 4 1 6 


6 3 2 

5 6 4 

3 2 5 . 


. (16.46) 


the Spearman p’s between PQ, QR and RP respectively are — i?, so that the 

average p, say is equal to — = — o-26. We shall consider a slightly different 

coefficient linearly related to 

Suppose we sum the ranks in the columns of (16.46), obtaining the numlj^s 


11 


8 8 14 11 11. 



THE CASE OF m RANKINGS 


411 


These ntunbeiB must sum to 63 (and in general to 


m»(n + l)'i 


and reflect the degree of 


resemblanoe among the rankings. If the concordance were perfect the sums would be 
3, 6, 0, 13, 16, 18, though not necessarily, of course, in that order, and in such a case would 
be as different as possible. On the other hand, when there is little or no resemblance, as 
in the example g^ven, the su m s are approximately equal. It is thus natural to take the 
variance of these sums as providing a measure of the ranking concordance. 

Let 8 be the sum of the squares of deviations from the mean -. 


If the con¬ 


cordance is perfect the sums are m, 2m, , . . nm and the sum 8 is 


»»*(»* — n) 
12 ' 


Write then 


W = 


12^? 


m*{n^ — n) 


. (16.47) 


Then W may vary from 0 to 1 and we shall call it the coefficient of concordance. In the 
above example it will be found that 8 = 25*5, W = 0-16. 


16.30. W is connected with by the relation 

mW — 1 


Pm 


m — I 


. (16.48) 


In fact, if the rankings, measured from the mean J(n 1), are Xj, . . . Xi„, x,i . . . x^n, 
. . . , . . . x,nm ^he average p is 


_1 

m(m — 1) 


12 


m n 




i 


*, t-i 


_ 12 f ^ V - V 2 ! 

“ -l)(n^ - n) 

_ _ __ 12_ Is — — ~-?l 

m(m — l)(w* — n)\ 12 J 

mW — 1 
"m--i ■' 


. (16.49) 


Pg^ is the intra-class correlation for the m sets of ranks considered as variate-values. 


cannot be less than 


— 1 

(m - !)■ 


It 


16.31. To test whether an observed value of W is significant it is necessary to consider 
the distribution of W (or, more conveniently, of 8) in the population obtained by permu ting 
the n ranks in all possible ways in each of the m rankings. No generality is lost in supposing 
one ranking fixed and the others will then give rise to (n!)”*-^ values of 8. We wiU ascertain 
the distrihiltions for some low values of n and m and show how to approximate for larger 
values by the use of a continuous distribution. 



412 


BANK OOBBBLATION 


For the case «n » 2 the distribution of S is that of 2^ in Table 16.1. The distributions 
have also been found for n = 3, w == 2 to 10 ; » = 4, m = 2 to 6 ; and n «= 6, m «= 3. 
Tables 16.6 to 16.8 give the probabilities based on these distributions in a form analogous 
to Tables 16.2 and 16.4. 

TABLE 16.5 

Concordance Coefficient W. Probability that a given Value of 8 will be Attained or Exceeded 

for n 3 and Values of m from 2 to 10. 


Values of m 


s 

2 

3 

4 

5 

6 

7 

8 

9 

10 

^ 0 

1000 

1000 

1*000 

1*000 

1*000 

1*000 

1*000 

1*000 

1*000 

2 

0*833 

0*044 

0*931 

0*964 

0*956 

0*964 

0-967 

0*971 

0*974 

6 

0*600 

0-628 

0*663 

0*691 

0*740 

0*768 

0*794 

0*814 

0*830 

8 

0167 

0*361 

0*431 

0-622 

0*570 

0*620 

0*654 

0*685 

0*710 

14 


0*194 

0*273 

0*367 

0*430 

0*486 

0*531 

0*569 

0*601 

18 


0*028 

0*125 

0*182 

0*252 

0-306 

0356 

0-398 

0*436 

24 



0*069 

0-124 

0*184 

0 237 

0'285 

0*328 

0*368 

26 



0*042 

0*093 

0*142 

0*192 

0-236 

0-278 

0*316 

32 



0*0046 

0039 

0-072 

0*112 

0*149 

0*187 

0*222 

38 




0-024 

0*052 

0-085 

0-120 

0*164 

0*187 

42 




0*0085 

0*029 

0*051 

0*079 

0-107 

0-135 

60 




00»77 

0-012 

0*027 

0*047 

0 069 

0 092 

64 





0-0081 

0*021 

0*038 

0*057 

0*078 

66 





0*0055 

0-016 

0-030 

0*048 

0 066 

62 





0*0017 

0*0084 

0-018 

0 031 

0 046 

72 





00 M 3 

0*0036 

00099 

0 019 

0*030 

74 



j 



0*0027 

0 OOHO 

0*016 

0-026 

78 






0*0012 

0*0048 

0*010 

0-018 

86 






00®32 

0*0024 

0*0060 

0-012 

06 






0*0332 

0*0011 

0*0035 

00076 

98 






0 * 0<21 

0 * 0®86 

0*0029 

0-0063 

104 





1 


00»26 

0*0013 

0-0034 

114 







oom 

00»66 

0-0020 

122 







0 * 0*61 

0*0835 

0-0013 

126 







00*61 

0 * 0»20 

0 - 0“83 

128 







00*^36 

oon )7 

0 - 0»51 

134 








0 * 0*54 

0 - 0>37 

146 








0 0*11 

0 - 0»18 

160 








oom 

0 - 0 »ll 

162 








OOMl 

0 - 0*86 

168 








O-OMl 

0 - 0*44 

162 








0 * 0<^00 

0 - 0*20 

168 









0 - 0*11 

182 



i 

1 






0 - 0*21 

200 








j 

0 - 0’99 




THE CASE OF m RAHHINGS 


413 


I 


TABLE 16.6 


Concordance Coefficient Tf. 


Probability that a given Value of 8 will be Attained or Exceeded 
for n = 4 and m = 3 and 5. 


s 

fM 3 

m 6 

S 

tn «9 5 

1 

1000 

1-000 

61 

0-056 

3 

0-968 

0975 

65 

0-044 

5 

0-910 

0-944 

67 

0-034 

9 

0-727 

0-857 

69 

0-031 

11 

0-608 

0-771 

73 

0-023 

13 

0-624 

0-709 

76 

0-020 

17 

0 446 

0-662 

77 

0-017 

19 

0-342 

0-661 

81 


21 

0 300 

0-521 

83 

0-0087 

25 

0-207 

0-445 

85 

0-0067 

27 

0-175 

0-408 

89 

0-0065 

29 

0-148 

0 372 

91 

00031 

33 

0-075 

0-298 

93 


35 

0-064 

0-260 

97 


37 

0-033 

0-226 

99 

0-0016 


0-017 


101 

0-0014 

43 

0-0017 

0-162 

105 

0-0864 

45 

0-0017 


107 

0 - 0*33 

49 ! 



109 

0 - 0*21 

51 1 



113 


53 


0093 

117 


57 1 



125 


59 1 

1 

1 

0-067 









O000Slf^l?0O005^l>&Q005»Kt5O<300>l^t0 


414 


BAm CXIBBELATZON 


TABLE 16.7 


Concordance Coefficient W. Probability that a given Value of 8 wiO, be Attained or Exceeded 

for n = 4 and m — 2, 4 and 6. • 





















THE CASE OP m RANKINGS 
TABLE 16.8 


416 


Omcordunce Coefficient W. Probability {hat a given Value of S will be Attained or Exceeded 

for n 6 and m ^ 3. 


8 

tn «= 3 

8 

m «= 3 

0 

1*000 

44 

0-236 

2 

1000 

46 

‘ 0*213 

4 

0*988 

48 


6 

0*972 

50 


8 

0*941 

52 


10 

0*914 

54 

0*117 

12 

0*845 

56 

0*096 

14 

0 831 

58 

0*080 

16 

0*768 

60 

0*063 

18 

0*720 

62 

0056 

20 

0*682 

64 

0-046 

22 

0*649 

66 

0*038 

24 

0*595 

68 

0*028 

26 

0*559 

70 

0*026 

28 

0*493 

72 

0*017 

30 

0*475 

74 

0*015 

32 

0*432 

76 

0*0078 

34 

0*406 

78 

0*0053 

36 

0 347 

80 

0-0040 

38 

0*326 

82 1 

0-0028 

1 40 

0*291 

86 

0 - 0»90 

1 “ _ 

0*253 

90 

0 - 0‘69 


16.32. Those distributions may be obtained by two methods. The first consists 
of building up the distribution for {m + 1) and n from that for m and n. For example, 
with m = 2 and n = 3 we have the following values of the sums of ranks, measured about 
their mean :— 


Type 

-2 0 2 

-2 1 1 

~ 1 0 1 

0 0 0 


Frequency 

1 

2 

2 

1 


Here — 2, 1, 1, and 2, — 1, — 1 are taken to be identical types, for they give the same 
value of 8 and will also give similar types when we proceed to m = 3 as follows. 

In the case = 3, each of the above type will appear added to the six permutations 
of — 1, 0, 1 ; e.g. the type — 2, 0, 2 will give one each of — 3, 0, 3 ; — 3, 1, 2 ; — 2, — 1, 3 ; 

2, 1, 1 ; — 1, — 1, 2 ; and — 1, 0, 1. These types are then comited for each of the 
four basic types of m == 2 and we get:— 


Type 

--3 0 3 

-3 1 2 

-2 0 2 

-2 1 1 

- 1 0 1 

0 0 0 


Frequency 

1 

6 

6 

6 

15 

2 







416 


RANK CORRELATION 


The case m == 4 is treated by considering the numbers of types obtained by adding 
the six permutations of — 1, 0, 1 to the types for m = 3; and so on. 

This method is quite convenient for » = 2 and » = 3. For «. *= 4 it becomes difficult 
owing to the labour of considering 24 permutations at each stage and to the increase in 
the number of types. For » = 6 there are 120 permutations and the labour becomes 
excessive. 

The second method is a generalisation of the jF-funotion of 16.12. For m rank* 
ings, the distribution of 8 is given by the expansion of an m-dimensional JF-function. 
For example, with m = 3 there would ^ a three-dimensional ^-function the bottom plane 
of which would be 


«{ 



. • . (t 



^_^3(n4-l)|8 

at J 

. . * a 







The plane above this would be 




[«+3 



8 


and 80 on. 

The .^-function is difficult to handle in more than three dimensions, but for the two- 
and three-dimensional case it is manageable and was used to obtain the distribution of 
for » = 5 and m = 3* 


16.33* We now proceed to find the first four moments of the distribution of S, The 
method is similar to that used for p but is somewhat more comj)licated. 

Writing for the deviation from the mean of the jth member of the ith 

ranking, we have, as in 16.30, 


^ 1 m - 1 

W --h- Pat 

m m 


Write 



1 m n 

w® — n 




i 2 , 


at 




^ki 




i ^ k 


(16.60) 

(16.51) 


where t, k can have aU values from 1 to m and thus any term appears again as 
in the sum BBf^. Then the moments of W are derivable from those of the JS’s, which 





THE CASE OF m BANKINGS 


417 


in turn ar© derivable from those of Spearman’s p. In fact, writing N — ^ 2 “ 

&om (16.18) and (16.19) 


1 


EM=^N*. 


n 


1 


f3(25»» 

I Aar 


- 38«* - 36» 4- 72) 


\ 26»(» + l)(n - 1)» 




. (16.52) 


We next require the moments of 

but complications arise because in some cases the JK’s are correlated among themselves. 
Any two jB's are independent, i.e. 

^/m) = 0,.(16.53) 


unless of course i = h This may be seen by reference to (16.51), the a;’s being 

independent. Similarly 

E(Rik Rim Ef^p) = 0 , 

except when wo have a set with circular suffixes such as 


E{Rik Rjci Rii)t . . • • , (16.54) 


for in this case the a;*s cease to be independent. Similarly any four R'& are independent 
unless they form a set such as 

Eik Ri^ Rmi^ • • • • • (16.55) 

We have 

E{Iiik Rki Eu) 2^ x^D Xff 2^ xiy xA 

\ aaml y„X ' 


■== E [ {Ix^^ x,„ + S x^ x,f + Xip z^) } {27 z,^ x^^ }] 

= E [ {JS7(x*,a) E Xi^ Xi^) + E{x^ x*^) {Exi^ - Zx,-, Zx^}] x [Z x,^ x^^] 


= {-^(a^fca*) - Z(Xfc,Xjt^)} Z(Zx<.x,,)* 

“ (n” 1)* * • * • 


(16.66) 


We then have 




= _—L_^2:(£(ij )) 
m(m — 1)JV 


-“0.(16.67) 

A . 8 .— VOL . I . BB 





418 


RANK CORRELATION 




"" ' m*(»i - 1)*N* 

1 


m*(m — 1)*N* 

' I 

m*(TO — 1)*N* 
2 

2 1 


+ £ Jiifi E<() 

N(2rj?^*) 

N* 


.m(m — 1) 


w — 1 


m(wi — 1) n — 1 

"" TO»(m - 
1 


m^(m — 1)W* 


J? Rjci ‘S/i)> 


aJl other terms vanishing, 


Sm{m — l)(m — 2) 
m®(m — 1)^* 

_ 8(m — 2) 1 


^li) 


m^m — 1)* (n — 1)^ 

From these results we have, for the first three moments of W, 

1 


fi'i (about 0) 


m 


_ 2(m — 1) 

" • • • • 

8(m — l)(m — 2) 
m^{n *— 1^2 

In a similar way—^we omit the lengthy algebra—it may be shown that 


iW* 

/*» = 


^ 24(m - 1) f25n^ - 38n* - 35n + 72 

m^n - l)®t 

+ 2(n — l)(wi — 2) 4" + 3)(m — 2)(m — 3)} . 


. (16.58) 


, (16.59) 


. (16.60) 
. (16.61) 
. (16.62) 


. (16.63) 


16.34. The distribution of W is evidently asymmetrical since 5 ^ 0 unless m = 2. 
Consider then the possibility of approximating to the distribution by the Type I form 

dF = ir»>-i(l- IF)«-i dW, 0<W<1 . . (16.64) 

The first two moments of this are 


/x[ (about 0) = 

fit *= 


P ' 

P+9 

_ P9 _ 

(p + 9)Hp + g + 1) / 


. (16.66) 



THE CASE OF m RANKINGS 


419 


Identifying the Taluee of (16.60), (16.61) and (16.66), we find 

P = 1 

p + q m 

pq 2(m — 1) 


giving 


(P + q)^P + g + 1) m»(n — 1) 

1 


p = |(n - 1) 


m 


q^(m- l)|i(« ~ ~ I 


. (16.66) 


It will be found that the third moment about the mean of the Type I form is 


8(m — l)(»i — 2) 
m*{n — l)(mw. + — 2) 


8(m - l)(m - 2) 
»i*(n — 1) 


{■ 


ni(n — 1) + 2 


:} 


80 that the third moments of the PF-distribution and the Ty]pe I distribution are approxi¬ 
mately equal if m and n are not small. Similarly the fourth moments will be found to 
differ by a small quantity. We may therefore use the T 3 q)e I distribution to approximate 
to that of Wm It appears likely that as n, m oo the distribution of W tends to the Type I 
form, but this has not been rigorously demonstrated. 


16.35. The significance of W can then be tested in the Type I distribution, namely, 
by the use of incomplete JB-functions. More conveniently, we may transform (16.64) 
to the form 


by the transformation 


dF - ^ -- dz 

(v,e** + r.) i 


2 = i log 


(m — 1)TF 

~T^W^ 


, (16.67) 




and test in the z-distribution which has been tabulated. 

In making this test it is desirable, for low values of m and n, to make the usual correction 
for continuity by subtracting unity from 8 (equation (16.47)) and increasing the divisor 


m*(«.® — n) 
22 


by 2. 


Let us examine the approximation of the test in some cases wherein 


0-4696 


the exact values are known from Tables 16.5 to 16.8. 

For n = .3. m — 9, the 1 per cent, level is given approximately by 8 
For such a value, with continuity corrections, 

(78 - 1) _ 

81L^4 
“12 
0-979 

16 128 
9’ 


78 (Table 16.6). 


W 


*'i 


+ 2 



420 


BANK CORRELATION 


By linear interpolation of reciprocals in Appendix Table 6 we should require, for complete 
agreement, a value of z equal to 0'964. 

For n = 4, OT = 6 , the 1 per cent, point is approximately S =■ 100 . Vi =« 8/3, r, = 40/3. 
We have W = 0-6666, z = O-OIO. Prom Table 16.6 we should require a value of 0'898. 

For « ==s 5, OT = 3, there is no very convenient value of S close to the 1 per cent, point. 
For P = 0-016, S = 74, and for P = 0-0078, S = 76. 

For iS = 74 (with continuity corrections) z = 1-020 
-S = 76 ( ., „ „ ) z = 1-089. 

By interpolation from the tables z — 1-076. The use of the z-test would lead to the 
correct conclusion that a value of S equal to 74 falls below, and that of 76 above, the 
1 per cent, point. 

For values of m and n not included in Tables 16.6 to 16.8 it thus appears that the z-test 
with continuity corrections will give sufficiently accurate results, if » is greater than 3, 
at the 1 per cent, points. It may be presumed that the results at the 5 per cent, points are 
equally good and probably better. But for finer values of significance, such as 0-1 per 
cent., it is doubtful whether the test is sound. The tails of the distribution of S for moderate 
values of m and n are very irregular. 

16.36. A somewhat more approximate test of W has been given by Friedman (1937), 
who defined a statistic 

Xr* = m(n ~ l)Jf.( 16 . 68 ) 

and showed that the distribution of tends to that of the Type III as »tends to infinity, 
with (» — 1) degrees of freedom. This test appears reasonably satisfactory for moderate 
m and n, though not so accurate as ours. Friedman has also provided (1940) some 
significance levels of Xr^ calculated on the basis of the z-test. 

Example 16.5 

In some experiments in random series a pack of ordinary playing cards was shuffled 
and the order of the 13 cards of each suit from the top of the pack was noted. The pack 
was then reshuffled and again the orders noted. This was done 28 times. The question 
to discuss was whether the shuffling was good, in the sense that the cards were thoroughly 
mixed at each shuffle. 

Here, for each suit, say diamonds, we have 28 rankings of 13. The sums of ranks 
were 183, 137, 171, 207, 188, 160, 226, 174, 216, 192, 236, 239, 220. The mean is 196, 
and S = 11,622; W (without continuity corrections, which are not worth making for these 
values of m and n) — 0-08076, z = 0-432. This falls just beyond the 1 per cent, point. 

Similarly for the clubs W was found to be 0-0535 ; for the hearts, 0-0245 ; and for 
spades, 0-0342. None of these values is significant, and we conclude that the randomisation 
introduced by the shuffling was good, at all events, so far as this test was concerned. It 
may be added that the shuffling was done with much more care than would be taken in 
an ordinary game of cards. 

Example 16.6 

In psychological work there has sometimes been a confusion between the determin¬ 
ation of a measure of agreement between subjects and that of an objective order based 
on experimental rankings. It may therefore be as well to point out that in its psycho- 



ESTIMATION OP A TEUE RANKING 


421 


logical applications tlie test of W is one of concordance between judgments. There may 
be quite a high measure of agreement about something which is incorrect. 

A number of students were given 12 photographs of persons unknown to them, and 
asked to rank them in what they judged from the photographs to be their intelligence. 
For 16 students the sums of ranks were 

112, 94, 101, 84, 97, 75, 104, 84, 102, 146, 126, 124. 

The mean is 104. S «= 4472, W == 0*1222. z = 0*368, and is barely significant, being 
between the 1 i)er cent, and the 6 per cent, points. 

For 111 students the sums were 

818, 670, 908, 410, 706, 626, 780, 486, 596, 1044, 959, 756 
W == 0*2378, = 1*768. 

This is highly significant and it is to be inferred that community of judgment exists between 
students or groups of students. But there was little relationship between the judgments 
and the intelligence of the photographed subjects as given by the Binet Intelligence Quotient. 

Estimation of a True Ranking 

16.37. Suppose we have m sets of n rankings which show a significant concordance. 
Assuming that the relations between the rankings reflect the true ranking of the objects, 
how are we to estimate that ranking 1 or again, assuming merely a significant concordance 
between observers, what is the ranking ‘‘ nearest ** to their rankings ? 

An intuitive approach to this problem would probably lead us to this solution : the 
object whose true rank is 1 is that for which the sum of ranks is least; that whose rank 
is 2 is the one for which the sum of ranks is least but one ; and so on. For example, if 
there are three rankings of five objects totalling 9, 7, 4, 10, 16, we should take the third 
as rank 1, the second as 2, the fiirst as 3, the fourth as 4 and the fifth as 5. 

This solution can be given a firmer theoretical basis. It is the “ best ” in a least- 
squares sense. In fact, suppose the true ranking is Xi, Xg . . . where as usual the 
X’s are a permutation of the first n integers. Suppose the sums of ranks are Si, S', . . . 
Consider the sum 

U = .(16.69) 

If all the rankings wore correct, each would be mX^, so that this quantity represents 
in a sense the divergence from complete agreement. Our “ best ” estimate of the X’s 
will be given by minimising U, Now 

U - £(Si^) + Z(X^^) ^ 2m X Si X< 

and since the first two terms are constant we have to maximise £{8^ X^), the S*& being 
given and the X's the numbers 1 to ?^. Evidently this will be done by multiplying the 
biggest 8 by n.^tho next biggest by (?i — 1), and so on. The result follows. 

There is, of course, an indeterminacy in this method if any two of the 8'b are equal. 

Paired Comparisons 

16.38. When the objects which are being ranked are known to be measurable according 
to the quality concerned, no question as to the legitimacy of ranking arises. But cases 
occur in which it is by no means clear that ranking is legitimate, as for instance in the 
arranging of human beings according to intelligence or of pieces of music by human beings 
according to preference. To require an observer to carry out a ranking in such a case 



422 


RANK CORRELATION 


may be equivalent to asking him to arrange English towns in order of geographical position 
(whioh is two-dimensional), or a number of fruits according to taste (which is probably 
four-dimensional). The observer may attempt to comply in the full belief that he is doing 
something within his powers, but if the quality under consideration is not measurable 
on a linear scale the resulting ranking may fail to give either a real picture of his preferences 
or of the variation of the quality in the individuals. For example, in judgments of 
intelligence, it is not impossible that the observer should judge A more intelligent than 
* B, B than C, and C than A, if the individuals are presented for his consideration one pair 
at a time. The likelihood of this happening is obviously increased when we are dealing 
with tastes in music, eatables or film stars ; and in practice the event is not uncommon. 
Such “ inconsistent ” preferences can never appear in ranking, for if A is preferred to B 
and JS to (7, then A must automatically be shown as preferred to C. 


16.39. We therefore consider a more general method of investigating preferences. 
With n objects, we shall suppose that each of the possible pairs is presented to an 
observer and his preference of one member of the pair noted. If the object A is preferred 
to jB we write A—^B or B'*r-A. The preferences of a single observer may be repre¬ 


sented in tabular form as shown in Table 16.9. 

In this table, which is shown for the six objects A to F, an entry of unity in column Y 
and row X means X —► F, and is thus accompanied by a complementary zero in row Y 
and column X. The diagonals are blocked out. For example, in the table, A—*-B, A—>C, 
D—^Af etc. 


TABLE 16.9 


Tabular Bepresentation of Paired Compari/ion Schema. 



A 

B 

c 

D 

E 

F 

A 

— 

1 

1 

0 

1 

1 

B 

0 

— 

0 

1 

1 

0 

C 

0 

1 

— 

I 

i 

1 

1 

D 

1 

0 

I 

1 0 

— 

0 

0 

E 

0 

0 

! 0 

1 

I 

— 

1 

F 

0 

! 1 

0 

1 

0 

— 


The arrangement of the objects A to F in the row and column headings is quite arbitrary. 
There are (n!)* ways of representing the same configuration of preferences in such a table 



PAIRED COMPARISONS 


423 


according to the permutations of objects in row and column ; but in practice it is generally 
desirable to have the order in row and column the same, and even among the n\ possible 
arrangements so given there are often practical considerations which determine one order 
as more convenient than others. 

16«40« Paired comparisons may also be represented geometrically by a method 
which can be illustrated for the case of the six objects as follows 



Wo represent the six objects A to J?’ by the six vertices of a regular hexagon and join 
the vertices in all possible ways by straight lines. If A—^B we draw an arrow on the line 
AB pointing from A to B, The arrows shown on Fig. 16.2 correspond to the preferences 
ahown in Table 16.9. 

16.41. If an observer makes preferences of type A—^B--^C-^A we say that the triad 
ABC is inconsistent. In the geometrical representation an inconsistent triad is shown 
by a triangle in which all the arrows go round in the same direction. We may thus speak 
•of a circular triad of preferences. In Fig. 16.2 the triads ACD, BEF and three others 
are circular. 

It is also possible to have inconsistent triads of greater extent; but any such circuit 
must contain at least two circular triads. Suppose, for instance, that A BCD is circular, 
o.g. that . Then either ^—>0 or ( 7 — . In the first case ACD is circular, 

in the second ABC. Similarly, either ABD or BCD is circular. Thus the circular tetrad 
must contain just two circular triads. On the other hand it is possible for a tetrad to 
contain circular triads without being itself circular. 

Similarly, if ABCDE is circular either ABC or ACDE is circular and either BCD or 
BDEA is circular. If the two tetrads are circular there must be at least three circular 
triads (not necessarily four, because ADE may be common to both). It is easy to see 
by an actual example based on this configuration that there need not bo more than three 
circular triads; and it is clear that there must be at least three. For if the tetrads are 


424 


BANK CORRELATION 


not circular then ABC and BCD must be so and then either CDE is circular or ABCS 
is so, adding at least one more. 

Generally, it appears that a circular «-ad must contain at least (a — 2) circular triads ; 
but it nijay contain more, and the fact that an a-ad contains (a — 2) circular triads does 
not mean that it is itself circular. In discussing inconsistences, therefore, it seems best 
to confine attention to circular triads, which, so to speak, constitute the inconsistent elements 
of the configuration, and to ignore the more ambiguous criteria associated with circular 
polyads^f greater extent. 


16.42. We now prove the following theorems:— 

(1) The maximum possible number of circular triads is — r- - if a is odd and 

..... 

—^— if a is .even ; and the minimum number is zero. 

(2) These limits can always be attained by some configuration of preferences. 
Consider a polygon of the typo shown in Fig. 16.2 with a vertices. There will be 

(a — 1) lines emanating from each vertex. Let ai, a„ . . ., a„ be the number of lines 
at the respective vertices on which the arrows leave the vertex. 

Then 

W 1. 

and the mean value of a, is —-—. 



Define 





a(a — 1)* 
I 


. (16.70) 


We now show that if the direction of a preference is altered and the effect is to increase 
the number of circular triads by d, T is reduced by 2 d; and conversely. Consider the 
preference A—*-B. The only trifids affected by altering this to B-^A are those containing 
the line AB, Suppose there are a preferences of type A--*-X (including A—>B) and pre¬ 
ferences of type jB—►X. Then foiu" possible types of triad arise; 

A—*~X-*—B, say p in number 

A—*-X—>Ii, which must number a. — p — 1 

A<-X^Ii, „ „ „ (i ~ p. 


When the preference A—*-B is reversed the first two remain non-circular. The third 
becomes circular, the fourth ceases to be so. The reduction in the value of T is 


a* - (a - 1)* + 1)* 

= 2 (« - ^ - 
— 2d, say. 

The increase in the number of circular triads is 

(a-~p — 1) ~{§-p)=x~^-l 

“ d. 


1) 



COEFFICIENT OF CONSISTENCE IN PAIRED COMPARISONS 426 


More generally, if as the result of reversing any number of preferences T is decreased 
by 2 d, then i must be an integer and the numW of circular triads must be increased by d. 
This clearly follows from the previous results, for the reversal of preferences can take place 
one at a time and the effect on T and the number of circular triads is cumulative. 

We now Investigate the maximum and minimum values of T. It is clear from the 
definition that T is greatest when the a’s are the natural numbers 1,2, . . ., n ; and this 

is a possible case because it corresponds to ordinary ranking. Hence max. (T) = — - 

For the minimum value, consider the polygon -4,, . . A^. Set up the prefer¬ 
ences > . . , Clearly at any vertex this results in one arrow entering 

and one leaving the vertex, i.e. the contribution to a is unity at each vertex. Next set 
up the preferences . . . This circuit may either visit each vertex once, 

or not. In the latter case we proceed to an xmvisited vertex and set up the preferences 

and so on. Again there will be a unit contribution to all the a^s. 
We then set up the preferences Air->Ac-^A-j-^, etc., and so on ; and in this way we 
shall ultimately complete the preference scheme. 

If w is odd all the preferences described will consist of circular tours of the polygon, 

w 1 

and thufi the value of a for each vertex will be —^—. If n is even, the last preference 


Ai —1 will not be a tour but will consist of the single line joining one vertex with the 
symmetrically opposite vertex. Thus here will be ^ vertices for which a = ^ and ^ 

vertices for which a ^ ^ t * /tv w. 


In this case T 

4 


n ~~ 2 

Now it is clear from the definition of T that it cannot be less than zero, or if n is even, 
be less than The configuration just given shows that these minima are, in fact, attainable. 

Thus T can vary from a maximum of to a minimum of zero or Hence 

the maximum number of circular triads, being half the variation from maximum to minimum 
of T (the maximum of T corresponding to the ranking case in which there are no incon- 

8 - I 4 ^ 2 ' 72 '^ ^ 

sistencc^s), is —— if n is even and —if n is odd. 

24 24 

This establishes the two results enunciated at the beginning of this section. 


Coefficient of Consistence in Paired Comparisons 

16.43. If d is the number of circular triads in an observed configuration of preferences 
we define 


C = 1 
= 1 


_24d 
72 ® — n 
24(/^ 
n ^ — 471’ 


n odd 

71 even 


• (16.71) 


and call C the coefficient of consistence. If and only if it is unity, there are no inconsistences 
in the configuration, which may therefore be represented by a ranking. As C decreases to 
zero the inconsistence, as measured by the number of circular triads, increases. 

For example, in the configuration of Fig. 16.2 there are five circular triads, ABD^ 
ACD, AFD^ AED and BEF. The maximum possible number is 8. Thus C == 0*376. 



426 


RANK CORRELATION 


C can also be interpreted in the light of Table 16.9. Suppose, in that table, we sum 
the rows. (The column sums are determined by the row sums and add no firesh information.) 
The sum of any row will be the a-number for that vertex in the polygon which corresponds 
to the olyeot defining the row. T wiU then be Uie value of the sum of squares of deviations 

of row totals from the mean value —^—, that is to say, will be the variance of the row 

sums multiplied by a. is thus a linear function of this variance ; but it cannot be tested 
in the ;^*-di8tribution as if Table 16.9 were a contingency table, for the border cells are not 
independent or linearly dependent. 

16.44. If an individual observer produces a configuration of preferences which show 
inconsistence there are usually several explanations; he may be an incompetent judge, 
the objects may be so alike that consistent differentiation is not possible, or his attention 
may wander during the course of the experiment. We discuss these questions later. They 
are mentioned here to explain the motive for the next stage of the mathematics. With 
what probability can a value of C arise by chance if the observer allots his preferences at 
random with respect to the quality imder consideration ? 

(») 

With n objects there are possible configurations of preferences. We proceed to 

investigate the distribution ofd in this population of 2'*^ different members. The method 
consists of proceeding from the distribution for n to that for (n + 1). 

For « = 3 there are eight configurations, of which two give one circular triad and six 
no circular triads. Consider the effect of adding a new vertex D to the vertices ABC. 
Four cases arise: 

(1) i>-> all A, B, C. 

(2) i>—► two of A, B, C. 

(3) D—► one of A, J5, C. 

(4) D —*>0000 of A, B, G. 

The last two are symmetrical with the first two and need not be separately considered. 
Situation (1) arises in one way and clearly does not add any new circular triads other 
than those already existing in the configuration ABC. It therefore contributes six values 
d = 0 and two values d = 1. So does situation (4). 

Situation (2) arises in three ways, according as D<~A, B, or C. The configurations 
so reached are similar and we may take any one, say D-*—C, as the single preference. If 
A<—C then DAC is not circular and if B*—C then DEC is not circular. On the other hand 
A—*-0 and B—*-C will each produce a circular triad. We then have the cases 


1 

No. of Circular 
IViads added. 


0 


1 


1 


2 


We now consider AB. In the first two cases just enumerated the direction of AB 
does not matter and no circular triads are added. With the tliird A—*-B gives no circular 
triad but A^—B adds one. With the fourth A—*-B adds one and A<—B adds none. 




COEFFICIENT OF AGREBBIENT 427 

Thus the number of circular triads occurring for these four cases is found to be 


No. of Circular 
Triads. 

Frequency. 

0 

2 

1 

2 

2 

4 


We must multiply the frequency by three and by two to allow for simflar symmetrical 
arrangements, and the final results are 


No. of Circular 
Triads. 

Frequency. 

0 

24 

1 

16 

2 

24 

Total 

64 


The principles of this method are clear enough and the work may be formalised by 
a number of conventions which we omit to save space. In common with many similar 
combinatorial problems, however, troubles arise from the sheer number of possibilities and 
the difficulty of ensuring that nothing is overlooked. Up to the present the distribution 
of d for n up to and including 7 is known. The frequencies and probabilities are given in 
Table 16.10. 


Paired Comparisons for m Observers: Coefficient of Agreement 

16.45. We now con-sider the investigation of similarities of judgments for m observers. 
Suppose that in a table of the form of Table 16.9 we enter a unit in the cell in row X and 
column Y whenever X—rY and count the units in each cell. A cell may then contain 

any number from 0 to m. If the observers are in complete agreement there will be f ^ 

containing the number m, the remaining cells being zero. The agreement may be 

complete even if there are inconsistences present. 

Suppose that the coll in row X and column Y contains the number y. Let 

^ = .(16.72) 


the summation extending over the w(n 1) cells of the table (the diagonal cells being 
ignored), £ is then the sum of the number of agreements between pairs of judges. Put 



. (16.73) 


u = 




m 


EANK COKRBLATION 
TABLE 16,10 


Paired Compatisons. Frequency (f) of Values cf d and PnMbility (P) that Values wiU 

he Attained or Exceeded, 



The maximum number of agreements, occurring if (^ j cells each contain m, is 


n\ m 


r/?A 

2 / 


and thus in the case of complete agreement, and only in this case, w — 1. The further we 
go from this case, as measured by agreements between pairs of observers, the smaller 

Tft 

u becomes. The minimum number of agreements occurs when each cell contains if 
(m + 1) 

m is even or - — -z—- if w is odd. That is, if m is even, the minimum number of agreements is 


and in this case 



2), 


m — 1 

When m is odd the minimum value of « is found to be 

£ 

m 


u 


. (16.74) 


(16.76) 


16.46. We shall call « the Coefficient of Agreement. It is unity if and only if there 
is complete agreement in the comparisons. Its minimum value is not — 1 except when 
TO — 2. This, however, is to be expected in a measure of agreement, for there can be no 
such thing as complete disagreement among three or more observers in paired comparisons. 



CdE3raiClENT OF AGEEBMENT 


429 


IF obsenwr P differs ia o^ain eomparisons from obserrers Q and R, the two latter most 
afitee on those comparisons. 

When; m ms 2, u reduces to 


u 



1 . 


. (16.76) 


and S becomes twice the number of oases in which the two observers agree about a com¬ 
parison. It is thus a generalisation of a coefficient r. For general m, if the entries in the 
table were constrained to the ranking type, u would be the average intercorrelation r between 
observers taken two at a time. 


16.47. In discussing the significance of « it is desirable to know whether the set of 
preferences which give rise to it could have arisen by chance if the preferences had been 
assigned at random with respect to the quality under consideration. The procedure which 
first suggests itself is a generalisation of the method used for the case of m rankings. That 
is to say, we sum the entries in the rows of the table and consider the variance of these 
entries. If the preferences are allotted at random we expect to find about equal numbers 
given to each object, and the variance will be low; in other cases it wUl be higher. 

The difficulty about this suggestion is that it has not been found possible to ascertain 

the distribution of the variance in the 2 possible sets of preferences. The case m «= 1, 
corresponding to the distribution of d for inconsistences, is difficult enough to solve. For 
higher values of m no distributions are known except in trivial cases. 

A test can, however, be devised by using the coefficient u. Consider one cell in the 
table in row X and column Y and let it contain the number y. Then the corresponding 
cell in row Y and column X will contain m y. Thus these two contribute to Z the amount 



Now, of the total ways in which the units can be distributed in the first cell there 
will be in which y units occur. Consequently the distribution of X in the cell and the 
correspon^ng cell is given by the expression 




+ 



(16.77) 


and since the distribution in other pairs of cells is independent if the preferences are allotted 
at random the distribution of Z for the whole table is given by 


where N == 



D{Z) 


(16.78) 


16.48. The distributions have been worked out for the following values of m and n : 
m = 3, n=»2to8; m = 4, n=2to6; m = 6, n = 2to6; wi=6, n=a=2to4. 
Tables 16.11 to 16.14 give the probabilities based on these distributions, i.e. the probabilities 
that a given value of Z will be attained or exceeded. 



430 


RANK CORRELATION 


For constant n the distribution tends to the Type III form as m teuds to infinity. 
In fact, for a single pair of related cells the variate-value corresponding to a frequency 

is ~ + ^ 2 ^, which is a quadratic in y. Were the variate-value a linear function 

of y the distriburion for the single cell would tend to normality in accordance with the 
weU-known property of the binomial. The cai^ of the quadratic value corresjtonds to 
a transformation of the variate of the type x* = y, and the transform of the normal form 
exp{—a;*)d® becomes the Type III form exp (— y) dy. Since the N cells are 
independent and the sum of variates in the same Type III form is also distributed in that 

^-1 

form, it follows that X is in the limit distributed as exp ( — Z) dZ except perhaps 
for some constants. Thus Z or some multiple of it is distributed as %*. 

For constant m the distribution tends to normality with increasing n. 


TABLE 16.11 

Agreement in Paired Comparisons. The Probability P that a Value of Z wiU be Attained 

or Exceeded, for m = 3, n = 2 to 8. 


n 

« 2 

ft 

an 3 

n 

«= 4 

n 

= 6 

n 

6 

n 

=- 7 

n 8 


P 

z 

P 

H 

P 

r 

P 

E 

P 

E 

P 

E 

P 

1 

1000 

a 

1000 

6 

1000 

10 

1-000 

16 

1-000 

21 

1 000 

28 

1-000 

3 

0'260 

5 

0-678 

8 

0-822 

12 

0-944 

17 

0-987 

23 

0-998 

30 

1-000 



7 

0156 

10 

0-466 

14 

0-756 

19 

0-920 

26 

0-981 

32 

0-997 



9 

0016 

12 

0-169 

16 

0*474 

21 

0-764 

27 

0-925 

34 

0-983 





14 

0-038 

18 

0-224 

23 

0-539 

29 

0-808 

36 

0-946 





16 

0-0046 

20 

0-078 

25 

0-314 

31 

0-633 

38 

0-865 





18 

0-0»24 

22 

0-020 

27 

0-148 

33 

0-433 

40 

0-736 







24 

0-0035 

29 

0-057 

35 

0-256 

42 

0-672 







26 

0-0*42 

31 

0-017 

37 

0-130 

44 

0-400 







28 

0-0^30 

33 

0-0042 

39 

0066 

46 

0-250 







30 

0'0*95 

36 

00»79 

41 

0-021 

48 

0-138 









37 

0-0»12 

43 

0-0064 

50 

0-068 









39 

00M2 

45 

0-0017 

52 

0-029 









41 

0-0*92 

47 

0-0»37 

54 

0-011 









43 

00’43 

49 

0-0*68 

56 

0-0038 









45 

0-0*93 

51 

! 0-0*10 

58 

1 0-0011 











63 

1 0-0*12 

60 

1 0-0*29 











56 

i 0-0»12 

62 

0-0*66 











67 

1 0-0*86 

64 

0-0*13 











59 

1 0-0*44 

66 

0-0*22 





1 






61 

00i«16 

: 68 

0-0*32 





i 






63 

0-01*23 

70 

00’40 





i 








72 

0-0*42 













74 

0-0*36 













76 

0-01*24 








■ 





78 

0-01113 













80 

0-01*48 








1 





82 

0-01*12 













84 

1 

0-01*14 





* 


TABLE ie.l2 


Agreement •« Paired Compaaieom. The Probahiliiy P that a Value of S will be Attained 
or Exceeded, for m » 4 and n = 2 to 6 (for n ^6 only Values beyond the 1 per cent. Point 

are given). 


n 

08 2 

n 

8 , 3 

n 

S8 4 

n 

n 

6 

n 

» 5 

1 


n 

« 0 

X* 

g 


B 

X 

g 

X 

P 

X 

P 


D 


P 

2 

1000 

6 

1000 

12 

1-000 

20 

1-000 

42 

0-0048 

57 

0-014 

79 

0-0*42 

3 

0'625 

7 

0047 

13 

0-997 

21 

1 000 

43 

0-0030 

58 

0-0092 

80 

00«28 

e 

0 125 

8 

0-736 

14 

0-976 

22 

0-999 

44 

0-0017 

59 

0-0058 

81 

0-0*98 



0 

0‘455 

15 

0-901 

23 

0-995 

45 

0-0*73 

60 

00037 

82 

00*16 



10 

0-330 

16 

0-769 

24 

0-979 

46 

0-0*41 

61 

0-0022 

83 

0-0*12 



11 

0-277 

17 

0-632 

26 

0-942 

47 

0-0*24 

62 

0 0013 

84 

00^*51 



12 

0-137 

18 

0-624 

26 

0-882 

48 

0-0*90 

63 

00*76 

86 

00“30 



14 

0-043 

19 

0-410 

27 

0-806 

49 

00«37 

64 

0-0*44 

87 

0-0“17 



15 

0-026 

20 

0-278 

28 

0-719 1 

60 

0-0*26 

65 

0-0*23 

90 

00»»28 



18 

0-0020 

21 

0 185 

29 

0-621 

61 

0-0*93 

66 

0-0*13 







22 

0-137 

30 

0-614 

62 

0-0*21 

67 

00*72 







23 

0-088 

31 

0-413 

63 

0-0*17 

68 

00*36 







24 

0-044 

32 

0 327 j 

64 

0-0*74 

69 

0-0*18 







26 

0-027 

33 

0-249 ' 

66 

00’66 

70 

0-0*97 







26 

0-019 

34 

0-179 

67 

00’38 

71 

00»47 







27 

0-0079 

36 

0-127 

60 

0-0*93 

72 

00‘20 







28 

0-00301 

36 

0-090 



73 

00‘10 







29 

0-0026 i 

37 

0-060 



74 

00*51 







30 

0-0011 j 

38 

0-038 



75 1 

OO'IS 







32 

0-0*16 

39 

0-024 



76 1 

00’78 







33 

0-0*95 1 

40 

0-016 



77 i 

00’44 







36 

0-0*38 

41 

0-0088 



78 

00’15 




TABLE 16.13 


Agreement in Paired Comparisons. The Probability P that a Y^thie of S will be Attained 

or Exceeded, for m — 5 and n = 2 to 5. 



431 










432 


BANK CORRELATION 
TABLE 16.14 


Agrmnmi, in Paired Comparisona. The Probcdnlity P that a Value of S wiU be Attained 

or Exceeded, form = 6 and n 2 to 4. 


n 

2 

n 

ssa 3 

n 

4 

n 

« 4 

H 

me 4 

r 

P 

P 

P 

Z 

p 


P 

Z 

P 

6 

1000 

18 

1-000 

36 

1-000 

56 

0-043 

74 

0 - 0*12 

7 

0-688 

19 

0-969 

37 

0-999 

56 

0-029 

76 

0 * 0 » H 9 

10 

0-219 

20 

0-832 

38 

0-991 

67 

0-020 

76 

0 - 0*49 

15 

0-031 

21 

0-626 

39 

0-969 

58 

0-016 

77 

0 - 0*32 



22 

0-523 

40 

0-896 

59 

0-011 

80 

00*68 



23 

0-468 

41 

0-822 

60 

0 0072 

81 

0 * 0*17 



24 

0-303 

42 

0-765 

61 

0-0049 

82 

00'12 



26 

0-180 

43 

0-669 

62 

0-0034 

85 1 

00’34 



27 

0-147 

44 

0-656 

63 

00025 

90 1 

00*93 



28 

0-088 

45 

0-466 

64 

0*0016 





29 

0-061 

46 

0-409 

66 

00«83 





30 

0-040 

47 

0 337 

66 

00»66 





31 

0-034 

48 1 

0-257 

67 

0 - 0*48 





32 

0-023 

49 

0-209 

68 

0 - 0*26 





36 

0-0062 

60 

0-176 

69 

0 - 0*16 





36 

00029 

61 

0-133 

70 

0 - 0^86 





37 

0-0020 

62 

0-097 1 

71 

0 - 0^68 





40 

00»68 

63 

0-073 ! 

72 

00 M 8 





46 

0 * 0^31 

54 

0-057 

73 

0 - 0 M 6 




16.49. The first of these results suggests that the Type III distribution will provide 
an approximation to the distribution (16.78) when m is moderately largo. We proceed to 
find the first four moments of (16.78). 

It is sufficient to find the first four moments of (16.77), those of (16.78) being obtainable 
therefrom in virtue of the relationships which connect cumulants of independent distributions. 
The rth moment of (16.77) about the origin is given by 

“ [(4i)'4-.'. 

since 2" is the total frequency. Thus we have 


Sums such SB S (™)^ ^ obtained by opetating on the binomial (1 + a)*" p timea 

*1 


by x^^, e.g. we find 


S 




z 






COEFFICIENT OF A€»EEMENT 


433 


and benoa, aabstituling in (16.80), 

Thus tha mean of the distribution (16.78) is given by 
In a similar way we find 

"•-Kr) 


(16.81) 


(16.82) 


== ^^(2)1- - m)| J 

These are the moments of £. Those of u are obtained by dividing by an appropriate 


3ni* — 15m + 17 

8 “■ 


power of ** ““-y ^ noted in particular that the mean of is zero. 

16.50. The first four moments of the Type III distribution 

(IF == 3fl~^ dx 

*r« ? 1 ?? 

p’ p®’ p®’ p* 

Equating the second and third moments to those given by (16.82) we find 

_ Nm(m — 1) 1 
^ ~ 2(m -2)'* ’ 

% 2 ^ . 

P == -5- 

Tft 2i / 


. (16.83) 


To make the first moments correspond we move the origin of the iT-distribution a distance 
,, to the right. We thus reach the approximation to the 2’-distribution, 


coinciding in the first three moments, 


2jr Nmim - 


dF == ke *m- 2 ^ 


\m — 3 
)m — 2 


where x — £ — -? 

® \2/m-2 

or, transforming to the more usual form by putting x^ --^ 


\ m - 3 | 4 

/w — 2|m — 2* 


is distributed as 'w^lth 


Nm{m — 1) 


. (16.84) 


(m~— 2)*.(16.85) 

degrees of freedom. 

The fourth moments of X and the x* approximation differ by terms of order N~'^ and 
m~^ compared with their absolute values. 


A.S.—VOL. I. 



434 


RANK CORRELATION 


16.51. It only remains to be seen how large m and n mtujt be for this to provide 
a satisfactory approximation. 

Consider first the distributions for m = 3. When » =» 8, N =» 28, we have, for the 
approximation, 42? distributed with 168 degrees of freedom. Erom Table 16.11 we see 
that for 2? — 64, P = O-Oll and for 2? — 58, P =s O-OOll, Appl 3 dng a continuity correction 
by deducting unity from 2? we find for the x* approximation with — 4 x 53, * = 168, 
P = O-Oll, and with %* = 4 x 67, P = 0'00114. The correspondence is very close, in 
spite of the low value of w. 

For m == 4, n = 5, N = 10, the approximation gives 22? — 30 distributed with 
30 degrees of freedom. For 2? — 40 and 41, this gives, with continuity corrections of 0-5, 
half the variate-interval, x‘ = 49 and 61, v == 30. From the diagram at the end it is seen 
that these values lie one on either side of the 1 per cent, value ; and this is in accordance 
with the exact values of P, which are seen from Table 16.12tobe0016 and 0-0088. Similarly 
we find that the values of 2?, 37 and 38, lie on either side of the 6 per cent, level, which is 
again in accordance with the exact values, P = 0-060 and 0-038. 

For TO = 6, » == 4, N — 6, the approximation gives 2? — 33-76 distributed with 
11-26 degrees of freedom. For 2? = 69 and 60 the corresponding x‘ values are seen to 
lie on either side of the 1 per cent, point, which accords with the exact value of Table 16.14. 

We conclude that the x^ approximation provides an adequate test of significance for 
the values of to and n outside the range for which Tables 16.13 and 16.14 give exact values. 

Example 16.7 

A class of boys (ages 11 to 13 inclusive) were asked to state their preferences with 
respect to certain school subjects. Each child was given a sheet on which were written 
the possible pairs of subjects and asked to underline the one preferred in each case. The 
results were as follows : 

21 boys, 13 school subjects. The preferences are shown in Table 16.16, which is in 
the form described in 16.39; e.g. there were 18 boys who preferred Art to Religion. 


TABLE 16.16 


Preferences of 21 Boys in 13 Subjects. 
123466789 10 

11 

12 

13 

Totals 

1. Woodwork 

_ 

14 

20 

15 

16 

16 

16 

18 

18 

18 

20 

21 

20 

211 

2. Gymnastics 

7 

— 

14 

12 

13 

18 

14 

16 

16 

20 

16 

18 

19 

183 

8. Art 

1 

7 

— 

10 

34 

10 

16 

18 

16 

16 

17 

16 

19 

160 

4. Science 

6 

9 

11 

— 

11 

12 

16 

14 

13 

13 

17 

17 

16 

154 

6. History 

6 

8 

7 

10 

— 

14 

11 

12 

14 

16 

13 

14 

16 

140 

6. Geograjihy 

6 

3 

11 

9 

7 

— 

14 

14 

13 

13 

16 

15 

17 

137 

7. Arithmetic 

6 

7 

5 

6 

10 

7 

— 

9 

11 

13 

16 

13 

15 

116 

8. Koligion 

3 

6 

3 

7 

9 

7 

12 

— 

12 

14 

14 

16 

14 

116 

9. EngUsh Literature 

3 

5 

6 

8 

7 

8 

10 

9 

— 

10 

13 

13 

15 

106 

10. Commercial subjects 

3 

1 

5 

8 

6 

8 

8 

7 

11 

_ 

10 

10 

14 

91 

11. Algebra 

1 

5 

4 

4 

8 

6 

6 

7 

8 

11 


10 

13 

82 

12. English Grammar 

0 

3 

5 

4 

7 

6 

8 

6 

8 

11 

11 

_ 

13 

81 

13. Geometry 

1 

2 

2 

5 

6 

4 

6 

7 

6 

7 

8 

8 


61 

1 

Total 

1638 


I 




OOEFFiaCEBNT OF AOBEEMENT 


435 


Tim of £ fox titiis table, in wMoh tbe objeote axe aoranged in order of total 

Stomber of prefer^noef, znaj be sbortened by noting that £, as given by equation (16.72), 
xnay be txauisformed is^ the form 

where the summation now tabes place over the half of the table below the diagonal. Since 
the numbers in this half are smhller than those in the other half there is a considerable 
saving in arithmetic. ' 

We find £ = 9718 


and hence 


u = 


2 X 9718 

m 


- 1 


0188 . 


There is thus a certain amount of agreement among the children, indicated by the 
positive value of «. Is this significant ? 

We note first of all that this distribution of preferences could not have arisen by chance 
tc any acceptable degree of probability. In fact, ~ 412-4 (equation 16.84)) and v == 90-7. 
The largo value of v justifies the use of the normal approximation to the j;*-di8tribution 
and we find V(^X^) — — 1) = 15-3, a very improbable result on the hypothesis of 

a random allocation of preferences. 

The distribution of circular triads was as follows:— 


o. of Triads. 

Frequency. 

No. of Triads. 

Frequency. 

0 

1 

12 

1 

1 

1 

17 

3 

4 

5 

21 

1 

6 

2 

25 

1 

7 

2 

29 

1 

8 

1 

39 

1 

10 

1 





TotaI/ 

21 


The total number of circular triads was 242 with a mean of 11-5. Only one boy was 
entirely consistent. On the other hand, for » = 13 the maximum number of circular 
triads is 91, with a mathematical expectation of 71-5. It is thus clear that, except perhaps 
for one boy, we cannot suppose that any boy allotted preferences at random. We are 
again led to conclude that the boys are genuinely capable of making distinctions, and that 
consistently on the whole. Half the boys have coefficients of consistence ^ greater than 0-92. 

We conclude that the boys can make preferences and that in their view the subjects 
are sufficiently different to enable a reasonably consistent set of preferences to be made. 
So far as these data are concerned there would be no objection to the assumption that 
a eeale of preferences can be set up. With this in mind, we can say that the value of 
u indicates a certain amount of agreement, though not a strong one, between the boys as 
to which subjects they prefer. , 



BANK OOBBELATIOKr 


4Se 


NOTES AND BEEEBENCES 

Spearmsii lias suggeated another ooef9oient of rank oon*elatjon« via. 

• »* - 1 ’ 

but this " footrule '* is tuueliable as a measure of dependence—it cannot, fbr example, 
attain — 1. for earlier work on rank correlation see Spearman (1904, 1006), K, Feanfon 
<(1907) and " Student ” (1921). The distribution of in the case of independence was 
given by Kendall and others (1939). Pitman (1937) had previously suggested that it 
•could be approximately represented by the B-dlstribution. 

The coefficient r was suggested by Kendall in 1938. In practice p is probably more 
convenient. It is, however, remarkable that t is unique among correlation coefficients in 
depending only on linear processes, so that machines may be constructed to calculate it. 
Furthermore, t can be adapted to give partial rank correlation coefficients (Kendall, 1942). 

The problem of m rankings was considered by Friedman in 1937 and by Babington 
•Smith and KendaU and by Wallis in 1939. Friedman (1940) has reviewed this work and 
provided some useful tables based on the Type I approximative distribution. Wallis has 
pointed out that the coefficient W is the ranking analogue of the correlation ratio. KeEey 
{Statistical Method) had considered pg„ as a measure of concordance in rankings. 

See also Daniels (1944), Biometrika, 33, 129; Kendall (1946), Biometrika, 33, 239; 
and Daniels and Kendall (1947), Biometrika, in press. 

REFERENCES 

Eells, W. C. (1929), “ Formulas for probable errors of coefficients of correlation,” J. Amer. 
Statist. Ass., 24, 170. 

Friedman, M. (1937), “ The use of ranks to avoid the assumption of normality implicit 
in the analysis of variance,” Jour. Amer. Statist. Ass., 32, 676. 

- (1940), "A comparison of alternative tests of significance for the problem of m 

rankings,” Ann. Math. Statist., 11, 86. 

Hotelling, H., and Pabst, M. R. (1936), “Rank correlation and tests of significance in¬ 
volving no assumption of normality,” Ann. Math. Statist., 7, 29. 

Kendall, M. G. (1938), “ A new measure of rank correlation,” Biometrika, 30, 81. 

-, M. G., Kendall, S. F. H., and Babington Smith, B. (1939), “ The distribution of 

Spearman’s coefficient of rank correlation in a universe in which all rankings 
occur an equal number of times,” Biometrika, 30, 251. 

-and Babington Smith, B. (1939), “ The problem of m rankings,” Ann. Math. Statist., 

10, 275. 

-and Babington Smith, B. (1940), “ On the method of paired comparisons,” Biometrika, 

31, 324. 

- (1942), “ Partial Rank Correlation,” Biometrika, 32, 277. 

Pearson, K. (1907), “ On further methods of determining correlation,” Drapers Co. Memoirs, 
Biometric Series IV, London, Dulau & Co. 

Pitman, E. J. G. (1937), “ Significance tests which may be applied to samples from any 
populations ; Part II. The correlation coefficient test,” J. Boy. Statist. Supp., 
4, 226, and (1938) “ Part III. The analysis of variance,” Biometrika, 29, 322. 
Spearman, C. (1904), “ The proof and measurement of association between two things, 
Amer. J. Psychol., 15, 88. 

-- (1906), “A footrule for measuring correlation,” Brit. Jour. Psychol., 2, 89. 



EXERCISES 


431 


" Student ” (1921), “ An exi>erimental determination of the probable error of Dr. Spearman’s 
correlation tpoefficients,” Biometrika, 13, 263. 

Wallis, W. A. (1939), “* The oorrelatiou ratio for ranked data,”.yo«tr. Atner. Statist. Asa., 
34, 633. 

EXERCISES 

16.1. Show that the coefficients of rank correlation p between the natural order 1, 
. . 10 and the following rankings are — 0-37 find + 0-46 resjwctively. 

7, 10, 4, 1, 6, 8, 9, 5, 2, 3 ; 

10, 1, 2, 3, 4, 6, 6, 7, 8, 9. 

Show that the corresponding values of t are ~ 0-24 and -f 0-60, 

16.2. Defining 

Xr^ - m(n — 1) If 

show that approximately Xr^ is distributed as iJi Type III form with v = » —1 
degrees of freedom. (Friedman, 1937.) 

16.3. Show that If is the ratio of the sum of squares between columns and the total 

sum of squares (the rankings being regarded as arrayed one beloir the other) and hence 
that If is the square of the correlation ratio for such an array (the ranks being regarded 

as variate-values). The “ sum of squares ^twoen columns ” means the sum of squares 
of deviations of column means from their mean. (Wallis, 1939.) 

16.4. Show that Spearman’s “ footrule *’ 

w* - 1 

can attain, but not exceed, the value 1, and can be as small as, but not smaller than, — 

16.5. Verify formula (16.63). 

16.6. The following table shows the preferences of 25 girls in 11 school subjects. 


1 

1 

2 

3 

4 

6 

6 

7 

8 

9 

10 

11 

Totals 

1. Gymnastics 

I 

_ 

10 

19 

17 

20 

17 

21 

21 

21 

18 

22 

180 

2. Science 

15 

— 

12 

15 

17 

15 

21 

19 

18 

16 

17 

165 

3. Art 

1 0 

13 

—, 

10 

16 

18 

10 

17 

16 

19 

16 

147 

4. Domestic Science 

! 8 

10 

9 

— 

16 

11 

13 

16 

14 

11 

14 

121 

5. History 1 

5 

8 

9 

9 

— 

14 

18 

12 

13 

15 

18 

121 

6. Arithmetic 

1 8 

10 

7 

14 

11 

— 

12 

13 

12 

16 

18 

121 

7. (Geography 

4 

4 

15 

12 

7 

13 

— 

14 

15 

14 

14 

112 

8. English Literature 

4 

6 

8 

10 

13 

12 

11 

— 

14 

13 

14 

105 

9. Religion 

4 

7 

9 

11 

12 

13 

10 

11 

— 

11 

17 

105 

10. Algebra 

7 

9 

6 

14 

10 

9 

11 

12 

14 

— 

12 

104 

11. English Grammar 

3 

8 

9 

11 

7 

7 

11 

11 

8 

13 

— 

88 

1 _ 


Total, 

1 

1375 


Show that the coefficient of agreement u is 0*082 ; that this is significant; but that 
the girls are less alike in preferences than the boys of Example 16.7, 



APPENDIX TABLES 


APPENDIX TABLE 1 

Ifwinal Distribution. Frequency Function of the Normal Distribution at every TerUh the 
Standard Deviation^ with First and Second Differences, The value of the central ordinate 
at zero is 


X 

c 



AK 

X 

a' 



AK 

00 

0-89894 

199 

- 392 

2-5 

0-01763 

395 

+ 79 

01 

0-39695 

691 

- 374 

2-6 

0-01358 

816 

4^ 66 

0*2 

0-30104 

965 

--- 347 

2-7 

0-01042 

260 

4- 63 

0-3 

0-38139 

1312 

- 308 

2-8 

0-00792 

197 

4- 46 

04 

0-36827 

1620 

- 266 

2-9 

0-00696 

152 

4- 36 

0-6 

0-36207 

1886 

- 212 

30 

0-00443 

116 

4- 27 

0-6 

0-33322 

2097 

- 169 

3-1 

0 00327 

89 

+ 23 

0-7 

0-31226 

2256 

104 

3-2 

0-00238 

66 

4* 17 

0-8 

0-28969 

2360 

-- 62 

3-3 

0 00172 

49 

+ 13 

09 

0-26609 

2412 

0 

3-4 

0-00123 

36 

+ 10 

10 

0-24197 

2412 

+ 46 , 

36 

0-00087 

26 

+ 7 

M i 

0-21786 

2366 

4- 84 

3-0 

000061 

19 

4- 6 

1-2 

0-19419 j 

2282 

-f 118 

3 7 

0-00042 

13 

4- 4 

13 

0-17137 

2164 

+ 143 

3-8 

0-00029 

9 

4- 2 

14 

0-14973 j 

2021 

H- 161 

3-9 

0-00020 

7 

+ 3 

1-5 

0-12962 

1860 

+ 173 

4-0 

0-00013 

4 

, ., 

16 

0-11092 

1687 

-4 177 

4-1 

0-00009 

3 

_ 

1-7 

0-09405 

1610 

4- 177 

4-2 

0-00006 

2 

_ 

1-8 

0-07896 

1333 

4- 170 

4 3 

0-00004 

2 

_ 

1-9 

0-06562 

1163 

4- 162 

4-4 

0-00002 

— 

— 

20 

005399 

1001 1 

+ 150 

4‘6 

0 00002 

. ■ 1 


2 1 

0-04398 1 

861 1 

4 137 

4-6 

OOOOOl 1 

_ 

— 

22 

0-03547 

714 

-i- 120 

4-7 j 

0-00001 

— 

_ 

23 

0-02833 

694 

4- 108 

4 8 

0 00000 

i — 

_ 

24 

0-02239 1 

486 i 

! 

4- 91 

1 


i 

1 



Freciaion of Interpolation, —Owing to the magnitude of the second differoncos, simple interpolation 
near the beginning of the table may give an error up to 5 in tlie fourth place ; tlie use of second 
diiBferences will bring this down to 1 or 2 in the last place, third differences being small. Where third 
differences are greatest, in the neighboui'hood of x/a 0-6, the error may be as large as 3 in the last 
plaoe unless the third difference is used. 





APPENDIX TABLES 


439 


APPENDIX TABLE 2 

Normal DirfribuHon, Tho Distrilmtion Fwiction F of the Normal DistrUmtion, tabvJaled at 
every Tenth of the Standard Deviation, wUh First and Second Differences. 


X 

cr* 

F. 

/iM-f). 


X 

a' 

F. 



0*0 

0-60000 

3983 

40 

2-5 

0-99370 

166 

36 

01 

0-53983 

3943 

78 

2-6 

0-99634 

119 

28 

0-2 

0-67926 

3865 

114 

2-7 

0-99653 

91 

22 

0'3 

0-61791 

3761 

147 

2-8 

0-99744 

69 

17 

0*4 

0-65542 

3604 

175 

2-9 

0-99813 

52 

14 

0-5 

0-69146 

3429 

200 

30 

0-90865 

38 

10 

0-6 

0-72676 

3229 

219 

3-1 

0-99903 

28 

7 

0 7 

0-75804 

3010 

230 

3-2 

0-99931 

21 

7 

0-8 

0-78814 

2780 

240 

3-3 

0-99952 

14 

3 

0-9 

0-81504 

2640 

241 

34 

0-99966 

11 

4 

10 

0-84134 

2299 

239 

3-5 

0-99977 

7 


M 

0-86433 

2060 

233 

3-6 

0-99984 

5 

_ 

1*2 

0-88493 

1827 

223 

3 7 

0-99989 

4 

_ 

1-3 

0-90320 

1604 

209 

3-8 

0-99993 

2 

_ 

1*4 

0-91924 

1395 

194 

3-9 

0-99995 

2 

— 

1-5 

0-93319 

1201 

178 

4-0 

0-99997 

1 


1-0 

0 94520 

1023 

159 

4-1 

0-99998 

1 


1-7 

0 95543 

864 1 

143 

4-2 

0-99999 

— 

_ 

1-8 

0-96407 

721 

124 

4-3 

0-99999 

— 

_ 

1 9 

0-97128 

597 

108 

4-4 

0-99999 

— 


20 

0-97725 

489 

93 





2 1 

0 98214 

396 

78 



! 


22 

0-98610 

318 

66 





2 3 

0-98928 

252 

53 





2*4 

0-99180 

199 

44 






F attains the exact value 0-99999 between 4*26 and 4-27. 


Precision of Interpolation ,—Simple interpolation may lead to an error of 3 or 4 at most in the fourth 
place of decimals in the re^on where second differences are large ; the use of the second difference will 
bring tliis down to 2 or 3 in the last place, the largest errors tending to occur at the beginning of the 
table, where the third difference may be used if the greatest possible precision is desired. 





APPENDIX 


t-TabU. The Dietribviion Function of y fof Values of t 

(‘+Tr 

(Condensed to three figures from the four-figure 


1 

1. 

2. 


n 

5, 

6, 

B 

8. ^ 

9. 

10. 

0 



0*500 

0*500 

0*500 

0-500 

0*500 

IHIH 




0*532 

0*536 

0*537 

mBMm 


0*538 




0*539 


0*563 


0*573 

0*674 

B! "' 9 

0*576 


0*677 

0*577 

0*677 


0*593 

0*604 



B?«' 9 

0*613 

0*614 

0*614 

0*614® 

0*616 


0-621 

0*636 

0*642 


BTTnB 

0*648* 

0-649' 

0*650 

0*651 

0*651 

06 

0*648 

0*667 



BTTT^I 

0*683 

0*684 

0*685 

0*685® 

0*686 

0*6 

0*672 

0*695 



0*713 

0*716 

0*716 

0*717 

0*718 

0*719 

0‘7 

0*694 

0*722 

0*733 


0*742 

0*746 

0*747 

0*748 

0*749 

0*750 

0*8 

0-716 

0*746 

0*769 



0*773 

0*775 

0*777 

0*772 

0*779 

00 

0*733 

0*768 

Hi 


0*796 

0*799 

0*801 

0*803 

0*804 

0*806 

10 

0*750 

0*789 

Hi :i?9 


0*818 

0*822 

0-826 

0*827 

0*828 

0*830 

M 

0*766 


Rmffl 


b^^^b 

0*843 

0*846 

0*848 

0*860 

0*851 

1-2 

, 0*779 


B/ 



0*862 

0*866 

O-feflS 

0*870 

0*871 

1*3 

^0*791 

0*838 

:^H 

0*868 


0*879 

0*883 

0*885 

0*887 

0*889 

1*4 


0*852 

- ^H 



0*894» 

0*898 

0-900' 

0-902' 

0*904 

1-6 

0*813 

0*864 

^R - 

0*896 

0*903 

0*908 

0*911 

0*914 

0*916 

0*918 

1-6 

0*822 

0*875 




0*920 

0*923 

0-926 

0*928 

0*930 

1*7 

0*831 

0-884 




0*930 

0*933® 

0*936 

0*938 

0*940 

1*8 

0*839 





0*939 

0*943 

0*945 

0*947 

0*949 

1-9 

0*846 



0*935 


0*947 

0*960 

0*963 

0*955 

0*967 

20 

0*852 

0*908 




0*964 

0*957 

0*960 

0*962 

0*963 

21 

0*868» 

0*915 



0*956 

0*960 

0*963 

0*965® 

0*967 

0*969 

22 

0*864 

0*921 


0*954 


0-966 

0*968 

0*970® 

0*972 

0*974 

2-3 

0*869‘ 

0*926 



0*965 

0*969 

0*972® 

0*975 

0*976® 

0*978 

2*4 

0*874 

0*931 

0*952 


b^^^b 

0*973 

0*976 

0*978 

0*980 

0*981 

2*5 

0*879 

0-936 

0*956 



0*977 

0*979® 

0-981“ 

0*983 

0*984 

2'0 

0*883 

0*939 


0-970 


0*980 

0-982 

0*984 

0*986 

0*987 

2-7 

0*887 

0*943 

0*963 

0*973 

0*979 

0*982 

0-986 

0*986® 

0*988 

0*989 

2-8 

0*891 

0-946 

0*966 

0-976 


0*984 

0-987 

0*988 

0*990 

0*991 

2*9 

0*894 

0*949 


0*978 

0*983 

! 0*986 

0*988® 

0*990 

0*991 

0*992 

3*0 

0*898 

0*952 

0*971 

W2j£l|H 

0*986 

0*988 

0*990 

0*991® 

0*992® 

0*993 

8*1 

0*901 

0*956 




0*989 

0-991 

0*993 

0*994 

0*994 

3'2 


0*957 

0*975 



0*991 

0*992® 

0*994 

0*995 

0*995 

3*3 

0*908 

0*960 

0*977 

0*986 

0*989 

1 0*992 

0*993 

0-995 

0*996 

0*996 

3*4 

0*909 



0*986 

0*990 

; 0*993 

0*994 

0*995 

0*996 

0*997 

3*5 

0*911 

0*964 




1 0*994 

0*996 

0*996 

0*997 

0*997 

3*6 

0*914 

0*966 

0*982 



0*994 

0*996 

0*996® 

0*997 

0*998 

3*7 

0*916 

0*967 

0*983 



0*995 

0*996 

0*997 

0*997® 

0*998 

3*8 

0*918 

0*969 

! 7H 

0*990 


1 0*996» 

0*997 

0*997 

0*998 

0*998 

3*9 




0*991 

0*994 

1 0*996 

0*997 

0*998 

0*998 

0*998® 

4*0 

0-922 

0*971 


0*992 


0*996 

0*997 

0*998 

0*998 

0*999 

41 

0*924 

0*973 


0*993 

0-906 

0*997 

0*998 

0*998 

0*999 

0*999 


0*926 

0*974 


0*993 

0-996 

0*997 

0*998 

0*998® 

0*999 

0*999 

HH 

0-927 

0-975 


0*994 

11 

0*997« 

0*998 

0*999 

0*999 

0*999 

4*4 

0*929 

0*976 


0*994 


0*998 

0*998 

0*999 

0*999 

0*999 

4*5 


0*977 

0*990 

0*995 


0*998 

0*999 

0*999 

0*999 

0*999 

4*6 

0*932 

0*978 

0*990 



0*998 

0*999 

0*999 

0*999 

0*999® 

4*7 

0*933 



0*995 


0*998 

0*999 

0*999 

0*999 

1*000 

4*8 

0*936 

0*980 


0*996 

^R" i 

0*998® 

0*999 

1 0*999 

0*999® 


4*9 


0*980 

0*992 

0*996 

0*998 

0*999 

0*999 

1 0*999 

1*000 


50 

0*937 

0*981 




0*999 

0*999 

I 0*999® 



51 

0*938 

0*982 

0*993 

0-996' 

0*998 

0*999 

0*999 

! 0*999® 



5*2 

0-939' 


0-993 

0*997 

0*998 

0*999 

0*999 

i 1*000 



5*3 

0*941 

0*983 


0*997 

0*998 

0*999 

0*999 



I 

5*4 

0*942 

0*984 

0*994 

0*997 


0*999 

0*999® 




5*6 

0*943 

0*984 

0*994 

0*997 

0*999 

0*999 

0*999® 




5*6 

0*944 

0*986 


0*9975 


0*999 

1*000 


I 


5*7 

0*945 

0-985 


0*998 

0*999 

0*999 





6*8 

0*946 

0*986 

0*996 

0*998 


0*999 





6*9 

0*947 

0*986 

0*995 

0*998 


0*999® 





6*0 , 

0*947 

0*987 

0*996 

0*998 

0-999 

0*9995 






440 














































TABLE 3 

proceeding by Jntermh of 0-1 from 0 to 6, and for Values of v from 1 to 20. 


tables by Student ” in Metron, 5, 1925.) 


u 

11. 

12. 

18. 

14. 

16. 

16. 

17. 

18. 

19. 

20. 

0 

0*li00 

o-soo 

0*500 

0-600 

0*600 

0*600 

0*500 

0*600 

0-600 

0*500 

01 

0*639 

0-63» 

0*539 

0*636 

0*639 

0-539 

0*539 

0*539 

0*639 

0*539 

0*2 

0*677 

0*578 

0*678 

0*678 

0*678 

0*578 

0*678 

0*578 

0*678 

0-678 

0*8 

0*616 

0*616 

0*616“ 

0*616 

0*616 

0-616 

0*616 

0*616 

0*618 

0*616 

0*4 

0*662 

0*652 

0*662 

0*652 

0*663 

0*653 

0*663 

0*653 

0*663 

0*653 

0*5 

0*686“ 

0*687 

0*687 

0*688 

0*688 

0-688 

0*688 

0*688 

0*689 

0*689 

0*6 

0*720 

0-720 

0*721 

0*721 

0*721 

0*721“ 

0*722 

0*722 

0*722 

0*722 

0*7 

0*761 

0*761 

0*762 

0*762 

0*763 

0*753 

0*763 

0-764 

0*754 

0*764 

0*8 

0-780 

0*780 

0*781 

0*781“ 

0*782 

0*782 

0*783 

0*783 

0-783 

0*783 

0*9 

0*806 

0-807 

0-808 

0*808 

0*809 

0-809 

0*810 

0*810 

0*810 

0*811 

1*0 

0*831 

0*831“ 

0-832 

0*838 

0*833 

0*834 

0*834 

0*835 

0-836 

0*836 

1*1 

0*853 

0-853‘ 

0*854 

0*866 

0-856 

0*866 

0*867 

0*867 

0*867“ 

0*858 

1*2 

0*872 

0-873 

0*874 

0-876 

0*876 

0*876 

0*877 

0*877 

0-878 

0*878 

1*3 

0*890 

0*891 

0*892 

0*893 

0*893 

0*894 

0*894“ 

0*895 

0*895 

0*896 

1*4 

0*906“ 

0*907 

0*907“ 

0*908 

0*909 

0*910 

0*910 

0*911 

0*911 

0*912 

1*5 

0*919 

0-920 

0*921 

0*922 

0*923 

0*923“ 

0*924 

0*924“ 

0-926 

0*926 

1*6 

0*931 

0*932 

0*933 

0-934 

0*936 

0-935 

0*936 

0*936“ 

0-937 

0*937 

1*7 

0*941 

0*943 

0-943 ‘ 

0*944 

0*946 

0*946 

0*946 

0*947 

0*947 

0*948 

1*8 

0*950 

0-961 ‘ 

0*962“ 

0*963 

0*964 

0-965 

0*955 

0*966 

0*956 

0*956“ 

1*9 

0*058 

0*969 

0*960 

0*961 

0*962 

0*962 

0*963 

0*963 

0*964 

0*964 

2*0 

0*965 

0*966 

0*967 

0*967 

0*968 

0-969 

0*969 

0*970 

0*970 

0*970 

2*1 

0*970 

0*971 

0*972 

0*973 

0*973“ 

0*974 

0*974“ 

0*976 

0*976 

0-976 

2*2 , 

0*975 

0*976 

0*977 

0*977 

0*978 

0*979 

0*979 

0*979 

0*980 

0*980 

2*3 

0*979 

0*980 

0*981 

0*981 1 

0*982 

0*982 

0*983 

0*983 

0*983“ 

0*984 

2*4 ! 

0*982 

0*983 

0*984 

0*985 

0*985 

0*985“ 

0*986 

0*986 

0*987 

0*987 

2*5 

0*985 i 

0*986 

0*987 1 

0*987 

0*988 

0*988 

0*988“ 

0*989 

0*989 

0*989 

2*6 

0*988 

0*988 

0*989 

0*989“ 

0*990 

0*990 

0*991 

0*991 

0*991 

0*991 

2*7 

0*990 

0*990 

0*991 

0*991 

0*992 

0*992 

0*992 

0*993 

0*993 

0*993 

2*8 

0*991 

0*992 

0*992“ 

0*993 

0*993 

0*994 

0*994 

0*994 

0*994 

0*994“ 

2*9 

0*993 

0*993 

0*994 

0*994 

0*994“ 

0*994“ 

0*996 

0*995 

0*995 

0*996 

3*0 

0*994 

0*994“ 

0*995 

0*996 

0*995“ 

0*996 

0*996 

0*996 

0*996 

0*996“ 

3*1 

0*996 

0*995 

0*996 

0*996 

0*996 

0*997 

0*997 

0*997 

0*997 

0*997 

3*2 

0 996 

0*990 

0*996“ 

0*997 

0*997 

0*997 

0*997 

0*997“ 

0*998 I 

0*998 

3*3 

0*996“ 

0*997 

0*997 

0*997 

0*998 

0*998 

0*998 

0*998 

0*998 

0*998 

3*4 

0*997 

0*997 

0*998 

0*998 

0*998 

0*998 

0*998 

0*998 

0*998“ 

0*999 

3*6 

0*997“ 

0*998 

0*998 

0*998 

0*998 

0*998“ 

0*999 

0*999 

0*999 

0*999 

3*6 

0*998 

0*998 

0*998 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

3*7 

0*998 

0*998“ 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

3*8 

0*998“ 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

3*9 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999“ 

0*999“ 

1000 

4*0 

0*999 

0*999 

0*999 

0*999 

0*999 

0*999“ 

0*999“ 

1*000 

1*000 


4*1 

0*999 

0*999 

0*999 

0*999“ 

0*999“ 

1*000 

1*000 




4*2 

0*999 

0*999 

0*999“ 

1*000 

1*000 






4*3 

0*999 

0*999“ 

1*000 








4*4 

0*999“ 

1*000 









4*5 

0*999“ 










4*6 

1*000 











jVofe.—Tho mothods by which “ Student ” calculated the Metron tables are explained in notes by him 
and R. A. Fisher in that journal vol. 6, Pairt 3, 1925, pp. 18-24. The four figured of those values have been 
rounded up to three in the above table, except when the four-figure value concluded with a 5, in which case 
it is shown in full. In columns in which values greater than 0*9995 occur the first is written 1*000 and the 
remainder left blank. 


441 



442 


APPENDIX TABLES 


APPENDIX TABLE 4 

^Reprinted from Table VT of Prof. E. A. Fieher’s Statiatkal MeSuyia for Mmcmih Workers, 
< Oliver and Boyd, Ltd., Edinburgh, by kind permission of the author and the publishetB.) 

5 Pmt Cjbkt. Ponm or thb DisTBratmoN or *. 








Values of 







1. 

2. 

3. 

4. 

6, 

6. 

8. 

12. 

24. 

00* 


1 

2-6421 

2-6479 

2*6870 

2*7071 

2*7194 

2*7276 

2*7380 

2*7484 

2*7688 

2*7693 


2 

14692 

1*4722 

1 4765 

1*4787 

1*4800 

1 *4808 

1*4819 

1*4830 

1*4840 

1*4851 


3 

M677 

1*1284 

1*1137 

1*1051 

1*0994 

1*0953 

1*0899 

1*0842 

1*0781 

1*0716 


4 

10212 

0*9690 

0*9429 

09272 

0*9168 

0*9093 

0-8993 

0*8885 

0*8767 

0*3639 


6 

0-9441 

0*8777 

0*8441 

0*8236 

0*8097 

0*7997 

0*7862 

0*7714 

0*7650 

0*7368 


6 

0-8948 

0-8188 

0 7798 

0-7558 

0-7394 

0*7274 

0*7112 

0*6931 

0*6729 

0*6499 


7 

0*8606 

0*7777 

0*7347 

0*7080 

0*6896 

0*6761 

0*6576 

0-6369 

0*6134 

0-5862 


8 

0*8366 

0*7476 

0*7014 

0-6726 

0*6626 

0*6378 

0*6176 

0*5945 

0*5682 

0*5371 


9 

0-8163 

0*7242 

06767 

0*6460 

0*6238 

0*6080 

0-5862 

0*6613 

0*6324 

0*4979 


10 

0*8012 

0*7058 

0*6653 

06232 

0*6009 

0-6843 

0*6611 

0*5346 

0-5035 

0*4667 


11 

0*7889 

0*6909 

0*6387 

0-6066 

0*5822 

0*5648 

0*6406 

0*5126 

0*4795 

0*4387 


12 

0*7788 

0*6786 

0*6260 

0-6907 

0*6666 

0*6487 

0-5234 

0*4941 

0*4592 

0*4166 


13 

0*7703 

0*6682 

0*6134 

0*6783 

0-5636 

0*6350 

0*5089 

0-4786 

0*4419 

0*3957 


u 

0*7630 

0*6694 

0*6036 

0*6677 

0*6423 

0*5233 

0*4964 

0-4649 

0-4269 

0 3782 


15 

0*7668 

0*6618 

0*6950 

0*5586 

0*5326 

0*5131 

0*4856 

0*4632 

0*4138 

0*3628 


16 

0*7614 

0*6451 

0*5876 

0*5505 

0*6241 

0*5042 

0*4760 

0*4428 

0*4022 

0*3490 


17 

0*7466 

0*6393 

0*6811 

0*5434 

0*5166 

0*4964 

0*4676 

0 4337 

0*3919 

0*3366 


18 

0*7424 

0*6341 

0*6763 

0 5371 

0*5099 

0*4894 

0*4602 

0*4265 

0*3827 

i 0 3253 


19 

0*7386 

0*6295 

0*5701 

0*5316 

0*5040 

0*4832 

0*4535 

0*4182 

0*3743 

[ 0 3151 


20 

0*7352 

0*6254 

0*5654 

0*6266 

0*4986 

0*4776 

0*4474 

0*4116 

0 3668 

i 0*3067 


21 

0*7322 

0*6216 

0*5612 

0*5219 

0*4938 

0*4725 

0*4420 

0*4055 

0*3699 

0*2971 


22 

0*7294 

0*6182 

0 5574 

0*5178 

0*4894 

0*4679 1 

0*4370 

0*4001 

0 3536 

0 2892 


23 

0*7269 

0*6151 

0*5540 

0*5140 

0*4854 

0*4636 

0*4325 

0 3950 1 

0*3478 

0*2818 


24 

0*7246 

0*6123 

0*5608 

0*5106 i 

0*4817 

0*4598 

1 0*4283 

0*3904 

0 3425 

0 2749 


25 

0*7226 

0*6097 

0*5478 

0*5074 

0*4783 

0*4562 1 

0*4244 

0*3862 

0 3376 

0*2685 


26 

0*7206 

0 6073 

0*5461 

0*5045 

0 4752 

0*4629 

0*4209 

0 3823 

0 3330 

0 2626 


27 

0 7187 

0*6051 

0*5427 

0*5017 

0*4723 

0*4499 

0*4176 

0*3786 

0*3287 

0*2569 


28 

0*7171 

0*6030 

0*5403 

0*4992 

0*4696 

0*4471 1 

1 0*4146 

0*3762 

0*3248 

0*2516 


29 

0*7155 

0*6011 

05382 

0*4969 

0*4671 

0*4444 

0*41)7 

0*3720 

0*3211 

0*2466 


30 

0*7141 

0*5994 

0*5362 

0*4947 

0*4648 

0*4420 ! 

0*4090 

0*3691 

1 

0*3176 

0*2419 


60 

0*6933 

0*5738 

0*5073 

0*4632 

0*4311 

0*4064 

0*3702 1 

0*3265 

0*2654 

j 

0*1644 


00 

0*6729 

0 6486 

0*4787 1 

0*4319 

0*3974 

0*3706 

0*3309 

0*2804 

0*2086 

0 


APPENDIX TABLES 


443 


S' 


APPENDIX TABLE 6 

(Reprinted from Table VI of Prof. R. A. Fisher’s Statisticai Methods for Research Worlds, 
Oliver and Boyd, Edinburgh, by kind permission of the author and the publishers*) 

1 Pbb Csirr. Ponrrs or the DiszsisimoN ot z. 



1. 

2. 

3. 

4. 

6. 

6* 

8. 

12. 

24. 

00. 


1 

41535 

4-2685 

4-2974 

4-3175 

4-3297 

4-8379 

4-3482 

4 3585 

4-3689 

4-3794 


2 

2'2950 

2-2976 

2-2984 

2-2988 

2-2991 

2-2992 

2-2994 

2-2997 

2-2999 

2-3001 


3 

1-7649 

1-7140 ' 

1-6916 

1-6786 

1-6703 

1-6645 

1-6669 

1-6489 

1-6404 

1-6314 


4 

1-5270 

1-4452 

1-4075 

1-3856 

1-3711 

1-3609 

1-3473 

1 3327 

1-3170 

1-3000 


5 

1-3943 

1-2929 

1-2449 

1-2164 

1-1974 

1-1838 

1-1666 

1-1457 

1-1239 

1-0997 


6 

1-3103 

1-1955 

1-1401 

M068 

1-0843 

1-0680 

1-0460 

1-0218 

0-9948 

0-9643 


7 

1-2526 

1-1281 

1-0672 

1-0300 

1-0048 

0-9864 

0-9614 

09335 

0-9020 

0-8668 


8 

1-2106 

1-0787 

1-0136 

0-9734 

0-9459 

0-9259 

0-8983 

08673 

0-8319 

0-7904 


9 

M786 

1-0411 

0-9724 

0-9299 

0-9006 

0-8791 

0-8494 

0-8157 

0-7769 

0-7306 


10 

M535 

1-0114 

0-9399 

0-8954 

0-8646 

0-8419 

0-8104 

0-7744 

0-7324 

0-6816 


11 

1 1333 

0-9874 

0-9136 

0-8674 

0-8354 

0-8116 

0-7786 

0-7405 

0-6958 

0-6408 


12 

1-1166 

0-9677 

0-8919 

0^8443 

0-8111 

0-7864 

0-7620 

0-7122 

0-6649 

0-6061 


13 

1-1027 

0-9511 

0-8737 

0-8248 

0-7907 

0-7652 

0-7295 

0-6882 

0-6386 

0-6761 

JfS 

14 

1 0909 

0-9370 

0-8681 

0-8082 

0-7732 

0-7471 

0-7103 

0-6675 

0-6159 

0-6600 


15 

1 -0807 

0-9249 

0-8448 

0-7939 

0-7582 

0-7314 

0-6937 

0-6496 

0-5961 

0-5269 

o 

16 

1 0719 

0-9144 

0-8331 

0-7814 

0-7460 

0-7177 

0-6791 

0-6339 

0-5786 

0-5064 

TO 

O 

17 

1-0641 

0 9051 ; 

0-8229 

0-7705 

0-7336 

0-7057 

0-6063 

0-6199 

0-6630 

0-4879 


18 

1 0572 

0-8970 

0-8138 

0-7607 

0 7232 

0-6950 

0-6549 

0-6075 

0-5491 

0 4712 


19 

1-0511 

0-8897 

0-8057 

0-7521 

0-7140 

0-6854 

0*6447 

0-5964 

0-5366 

0-4560 


20 

1-0457 

0-8831 

0-7985 

0-7443 

0-7058 

0-6768 

0-6366 

0-5804 

0-5253 

0-4421 


21 

1 0408 

0-8772 

0-7920 

0-7372 

0 6984 

0-6690 

0-6272 

0-5773 

0-5150 

0-4294 


22 

1 0363 

0-8719 

0-7860 

0-7309 

0-6916 

0-6620 

0-6196 

0-5691 

0-5056 

0-4176 


23 

1 1 0322 

0-8670 

0-7806 

0-7251 

0-6865 

0-6555 

0-6127 

0-5615 

0-4969 

0-4068 


24 

' 1-0286 

0-8626 

0-7767 

0-7197 

0-6799 

0-6496 

0-6064 

0-6545 

0-4890 

0-3967 


25 

! 1-0261 

0-8685 

0-7712 

0-7148 

0-6747 

0-6442 

0-6006 

0*6481 

0-4816 

0-3872 


26 

1-0220 

0-8648 i 

0-7670 

0-7103 

0-6699 

0-6392 

0-5952 

0-5422 

0-4748 

0-3784 


27 

1-0191 

0-8513 

0-7631 

0-7062 

0-6655 

0-6346 

0-6902 

0-6367 

0-4685 

0-3701 


28 

1-0104 

0-8481 

0-7695 

0-7023 

0-6614 

0-6303 

0-5856 

0-6316 

0-4626 

0-3624 


29 

: 1-0139 

0-8451 

0-7562 

0-6987 

0-6576 

0-6263 

0-5813 

0-6209 

0-4570 

0-3550 


30 

1-0116 

0-8423 

0-7531 

0-6964 

0-6640 

0-6226 

0-5773 

0-5224 

0-4619 

0-3481 


60 

0-9784 

0-8026 

0-7086 

0-6472 

0-6028 

0-5687 

0-6189 

0-4674 

0-3746 

0-2362 

1_ 

00 1 
1 i 

0-9462 

1 

0-7636 

0-6651 

0-6999 

0-5522 

0-6162 

0*4604 

0-3908 

0-2913 

0 




444 


APPENBIX TABLES 


APPENDIX TA^LE 8 

Distribution Function of x* for One Degree of Freedom for Vahtes of %* ••O to 

Xf OK 1 by of O'Ol. 



F 

4 


P 

4 

0 

1-00000 

wa 

0*60 

0*47950 

436 

001 

0*92034 

3280 

0*61 

0*47514 * 


002 

0*88764 

2606 

0*52 

0*47084 

423 

003 

0*86249 

2101 

0*63 

0*46661 

418 

004 

0*84148 

1842 

0-64 

0*46243 

411 

006 

0-82306 

1656 

0*55 

0*45832 


006 

0*80660 

1516 

0*56 

0*45426 


fei 0*07 

0*79134 

1404 

0*67 

0*45026 

395 

008 

0-77730 

1312 

0*68 

0*44631 

389 

0*00 

0-76418 

1236 

0*59 

0-44242 

384 

0*10 

0*76183 

1169 

0*60 

0 43868 

379 

0*11 

0*74014 

nil 

0*61 

0*43479 

374 

0*12 

0*72903 

1060 

0*62 

0*43106 

369 

0*13 

0*71843 

1016 

0*63 

0-42736 

365 

0*14 

0*70828 

074 

0*64 

0*42371 

360 

0*16 

0*69864 

938 

0*66 

0*42011 


016 

0*68916 

906 

0*66 

0*41666 

361 

017 

0*68011 

874 

0*67 

0*41306 

346 

0*18 

0*67137 

846 

0*68 

0-40969 

343 

0*19 

0*66292 

820 

0*69 

0*40616 

338 

0-20 

0*65472 

796 

0*70 

0*40278 

334 

0-21 

0*64677 

773 

0*71 

0 39944 

330 

0*22 

0*63904 

762 

0*72 

0*39614 

326 

0*23 

0-63152 

731 

0*73 

0-39288 

322 

0*24 

0*62421 

713 

0*74 

0*38966 

318 

0*26 

0*61708 

696 

0*76 

0*38648 

315 

0 26 

0*61012 

679 

0*76 

0 38333 

311 

0*27 

0*60333 

663 

0*77 

0*38022 

308 

0*28 

0*69670 

648 

0 78 

0*37714 

304 

029 

0*59022 

634 

0*79 

0*37410 

301 

0*30 

0*58388 

620 

0*80 

0*37109 

297 

0*31 

0*67768 

607 

0*81 

0*36812 

^ 294 

0*32 

0*57161 

695 

0*82 

0*36518 

291 

0*33 

0*56666 

583 

0 83 

0*36227 

287 

0*34 

0*55983 

672 

0*84 

0*36940 

285 

0*36 

0*66411 

660 

0*85 

0*35655 

281 

0*36 

0*54851 

661 

0*86 

0*36374 

278 

0*37 

0*64300 

640 

0*87 

0*36096 

276 

0*38 

0*63760 

630 

0*88 

0*34820 

272 

0*39 

0*53230 

621 

0*89 

0 34548 

270 

0*40 

0*62709 

612 

0*90 

0*34278 

267 

0*41 

0*52197 

503 

0*91 

0 34011 

264 

042 

0*61694 

495 

0 92 

0 33747 

261 

0*43 

0*51199 

487 

0*93 

0 33486 

258 

0*44 

0*50712 

479 

0*94 

0*33228 

266 

0*46 

0*50233 

471 

0*95 

0*32972 

263 

0*46 

0*49762 

463 

0*96 

0*32719 

251 

0*47 

0*49299 

467 

1 0*97 

0*32468 

248 

0*48 

0*48842 

449 

0*98 

0*32220 

246 

0*49 

0*48393 

443 

0*99 

0*31974 

243 

0*60 

0*47950 

436 

1*00 

0*31731 

241 









APPENDIX TABLES 


445 


APPENDIX TABLE 7 

DiitrUmthn Funotim of for One Degree of Freedom for Values of from 1 to 10 bu 

Steps of 0-1. 

























































































































































































































































INDEX 


(Be^rencea are to pages,) 

Abortion, distribution of women according to term 


of, (Table L23), 26. 

Abrupt distributions, corrections for grouping to, 
79; refs., 86-6. 

Absolute moments, 66; LiapounofTs inequality 
for, (Exercise 3.14), 88. 

Accidents, exempHiied by Poisson distribution, 
124. 

Adyanthaya, N. K., refs., distribution of frequency 
constants in small samples, (under Pearson), 
228. 

Age, correlation with highest audible pitch, 
(Table 14.1), 326; (Example 14.1), 331. 

Agreement, coefficient of, 427-9 ; gignfficance of, 
429-36. 

Agricultural Research Institute, Oxford, data 
from Report of, (Table 1.9), 9. 

Alcoholism and crime, Goring’s data on, (Table 
14.6), 356. 

- and health, (Exercise 14.12), 366. 

Ammon, O., data from (hair and eye-colour), 
(Table 12,4), 300. 

Antliropometric Committee of British Association, 
dal a from Report of, (Table 1.7), 8; 
(Table 1.10), 10. 

Antiinode, definition of, 36. 

Approximations to sampling distributions, see 
Sam[)liug distributions. 

Arithmetic moan, see Mean, arithmetic. 

Aroian, L. A., fitting of Type B distribution to 
data, (Example 6.4), 156 ; refs., Type B 
series, 160. 

Arrays, in bivariate distributions, 327. 

Association, generally, 308-23; coefficients of, 
310-13 ; partial, 313-17 ; illusory, 317-18. 

Asymmetrical distributions, 10; see Skewness. 

Attributes, sampling of, 197-201 ; in Poisson dis¬ 
tribution, (lOxercise 8.2), 203; in finite 
populations, (Exercise 8.3), 203. 

Australian marriages, distribution of, (Table 1.8), 
9; moments of, (Example 3,1), 50-2; 
and of, (Example 3.16), 82. 

Average, see Mean. 

- corrections to moments, 74-5; see also 

Sheppard’s corrections. 

6, (sampling value of measure of skewness), 
279-80. 

Babington Smith, B., data from, on random num¬ 
bers, (Table 8.3), 189 ; Random Sampling 
Numbers, 193, 197 ; refs, (under Kendall), 


202 ; distribution of Spearman’s p, problem 
of m rankings, method of paired com¬ 
parisons, 436. 

Baker, G. A., distribution of means in Type A 
series, (Exercise 10.7), 252. 

Bayes, T., refe., doctrine of chances, 183. 

Bayes’ theorem, 175-7 ; postulate, 176-8 ; com¬ 
parison with maximum likelihood, 178-80 ; 
in estimating proportion of attributes, 200 ; 
(Exercise 8.6), 203. 

Beans, distribution of, (Table 1,16), 20 ; histogram 
of, (Figure 1.4), 20 ; fitting of Pearson dis¬ 
tribution to, (Example 6.1), 143-4; Qrarn- 
Charlier series fitted to, (Example 6.2), 151. 

Bemouilli polynomials, definition of, 58. 

-numbers, footnote, 69; 71, 78. 

Bernstein, S., refs., extension of central limit 
theorem, 183. 

Pi, Pi (skewness and kurtosis), 81 ; standard errors 
of, 226 ; sampling distributions of, 279-80, 
(Exercise 11.17), 289 ; generalised p, 82. 

J5-function, in summing binomial, 120. 

Bios in sampling, 187-90; in choosing plants, 
(Example 8.1), 187-8; in scale reading, 
(Example 8.2), 188 ; in reading randomising 
machine, (Example 8.3), 189; in crop- 
reporting, (Example 8.4), 189-90. 

Binomial distribution, general properties, 116-20 ; 
moments of, 117, (Example 3.2), 52; dis¬ 
tribution function of, 119-20; and y, 
of, (Example 3.17), 82 ; factorial moments 
of, (Exercise 3.6), 87 ; limiting form, 
(Example 4.6), 103 ; arising from mixed 
population, 122-4; with negative index, 
125-6 and (Exercise 5.7), 136 ; bivariate 
form, 133-4; cumulants of, (Exercise 6,1), 
135; incomplete moments of, (Exercises 
6.2 and 5.3), 135 ; in sampling of attributes, 
198 ; distribution of means of, (Example 
10.8), 243. 

Birth-rates, distribution of in Local Government 
Areas, (Table 1.1), 3 ; frequency polygon 
of, (Table 1.1), 4. 

Biserial tj, 356-8. 

Bivariate binomial distribution, 133-4, ‘f 

- frequency-distributions, 19-22. 

- moments and cumulants, 79-81 ; standard 

errors of, 211; -statistics and cumulants, 
281-3. 

-normal distribution, 22 ; (Example 3.16), 

79-80; moments of, (Exercise 3.16), 89; 


447 



448 


INDSX 


aa limit of bivariate binomial, 133-4 1 
correlation and regraasion of, 334-^; multi* 
variate form, 376-7. 

Bivariate Pokson distribution, (Exercise 5.8)* 136. 

Borel, E., reft., TraM du OaUml des FrobabUit49, 

22 , 188 . 

Bortkiewica, von, data from on suicide, (Table 
1.6), 7. 

Bose, S. S., distribution of variance ratio, (Exer* 
cise 14.8), 365. 

Bowley, A. L., refs., F, F. Edgeworth's Contribu¬ 
tions to Mothematiccd Statistics^ 160; repre¬ 
sentative method, 202. 

Brood-mares, distribution of fecundity in, (Table 
1.20), 24. 

Call discount rate, distribution of weekly returns 
according to, (Table 1.26), 28. 

Camp, B. H., refs., distribution functions of bi¬ 
nomial and hypergeometrio, 134. 

Card-shuffling, tested by (Example 12.1), 297-9; 
tested by rankings, (Example 16.6), 420. 

Carleman, T., criteria for uniqueness in the prob¬ 
lem of moments, 109 ; refe., Les fonctions 
guasi-analgtiqueSf 114. 

Carver, H. C., Sheppard corrections for discrete 
variable^, 85. 

Cauchy distribution, (Example 3.12), 67-8 ; char¬ 
acteristic function of, (Example 4.2), 95-6 ; 
' distribution of mean of samples from, 
(Example 10.1), 233-4 and (Example 10.15), 
247. 

Cave, B. M., refs., sampling of correlation co- 
efflcient (under Co-operative Study), 363. 

Census of Population, data from Housing Report, 
(Table 1.24), 27. 

-of Production, data from, on size of firms, 

(Table 1.17), 23. 

'Central Limit Theorem, 180-3. 

Characteristic functions, as moment-generating 
functions, 64; general theory of, 90-115; 
limiting properties of, 99-104; multi¬ 
variate, 104-5 ; conditions for a function 
to be, 98-9 ; in sampling distributions, 
242-6. 

Charlier, C* V. L., Types A and B series, 147 {^ee 
Gram-Charlier seri(3s); refs,, expansion of 
frequency functions, 160, 

Cheshire, L., refs., significance of correlation co- 
efiicient, (under E, 8. Pearson), 363. 

Chi-square, see x^» 

;f*-distribution; generally, 290-307; properties 
of, 292-7 ; in 2 X 2 tables, 303 ; correction 
for continuifcy, 803-4; as square con¬ 
tingency, 319. 

Cholera, inoculation against, (Table 12.6), 302; 
(Example 13.1), 309; (Example 13.2), 
311-12; (Example 13.3), 313. 


4 

Church, A. E. R.* sampling motnents* 846 ; 

284-5. 

Circu^r triads, in preferences, 423. 

Oass-frequency, ds^ition, 2. 

Class-interval, definition, 2 ; ambiguities in, 6. 
Cloudiness, distribution of days according to, 
(Table l.U), 10. 

Cochran, W. G., refs., ;;f*-distribdtion, 306. 

“ Cocked-hat ** as synonym for unimodal, 29. 

Coe65cients of association, contingency, correlation, 
etc., see under Association, Contingency, 
Correlation, etc. 

Coin-tossing, as example of sampling of attributes, 
(Example 8,9), 198. 

Colligation, coefficient of, 311. 

Combinatorial method, in sampling of A^-statistics, 
see A;-statistics. 

Comparisons, paired, see Paired comparisons. 

Comrie, L. J., refe., Tables of arc tan x and log 

(1 4- jr*), 160. 

Concentration, coefficient of, 43; curve of, 43-4 
and (Figure 2.3), 44. 

Concordance, coefficient of, 4ll ; see also m 
rankings. 

Consistence, coefficient of, 425. 

Contingency, 318-22 ; coefficient of, 319. 

Continuity correction to x^» 303-6. 

Continuoxis frequency functions, 13; sampling 
from, 197. 

Continuum, probability in, 170. 

Co-operative Study on correlation coefficient, refs., 
363. 

Cornish, E. A., refs., moments and cumulanta in 
specification of distributions, 160. 

Corrections for grouping in calculation of momont-fl, 
30, 41 ; when distribution is abrupt, 79, 
refs., 85-6 ; see also Shei)pard’s corrections. 

Correlation, coefficient of product-moment, genor- 
ally, 324-67 ; definition, 329 ; calculation 
of, 330-4 ; in bivariate normal distribution, 
334 ; sampling of, 336-48 ; standard error 
of, (Example 9.6), 211 ; Fisher’s trans¬ 
formation of, 346 ; tables of (David), 346. 

- coefficient of multiple correlation, 380-1 ; 

sampling of, 381-5. 

- coefficient of partial, 368-79; definition, 

370 ; in terms of coefficients of lower orders, 
372 ; geometrical interpretation, 372-3 ; 
examples of, (weather and crops, Example 

15.1) , 373-6, (crime and religion, Example 

16.2) , 375-6 ; in multivariate normal dis¬ 
tribution, 376-8 ; sampling distribution of, 
378-9. 

- intra-class, 358-62. 

-- rank, generally, 388-421 ; Spearman’s co¬ 
efficient, 388-91 ; sampling of, 394-403; 
coefficient r, 391-3 ; sampling of, 403-8. 
See also m rankings. 



INDEX 


44^ 


Oorreliitiofi 4e&iitioii, S51} sampling dlil4 

Mbutlon in imoomlated normal popiilation, 
S52«^3; relation witji multiple oorrelatiom 
381; for ranked data (Watlis)^ 437. 

Ck>v«wriaime» definition, 79; notation for, 204; 
oaloulation of, 330; distribution of in 
normal samples, 339-42. 

Cows* distribution of according to age and milk- 
yield, (Table 1.25), 27. 

Craig, C. C., corrections to moments of discrete 
distribution, 77, refs., 85 ; (Exercise 3.13), 
88; sampling of oumulants, 256, refs., 
285. 

Cram4r, H., convergence of Gram-Charlier series, 
151-2,159, refo., 160 ; central limit theorem, 
181-3; distribution of a ratio (Exorcise 
10.8), 252; refs.. Random Variablea and 
FrobabUity Distributions, 114, 183, 250. 

Crime and alcoholism, Goring’s data on, (Table 
14.6), 356. 

correlation with religion, (Example 16.2), 
376. 

Crop-reporters, bias in, (Example 8.4), 189-90. 

Crops, correlation with weather, (Example 15.1), 
373. 

Cuckoo’s eggvS, distribution of length of, (Exercise 
14.13), 366, 

Cumulants, definition, 60 ; in variant! ve properties 
of, 61 ; relations with moments, 61-4; 
existence of, 64-5 ; calculation of, 65-8 ; 
in bivariate case, 80 ; Sheppard’s correc¬ 
tions to, 78 and (rnultivariato case) 80-1 ; 
geiKjrating functions for, 90; of normiil 
chstribution, 129. 

Cumulative function, 90. 

Curve of concentration, see Concentration. 

Dairy farms, distribution according to costs of 
milk-production, (Table 1.9), 9. 

David, F. N., distribution of difierenoe of Type 
Ill variates, (Exorcise 10.6), 252 ; Tables 
of the Correlation Coefficient, 345, refs., 
363. 

Davies, O. L., refs., estimation of standard devia¬ 
tion, 228. 

Deaf-mutes, distribution of children of, (Table 
1.19), 24. 

Deaths, from scarlet fever, (Table 1.3), 6 ; dis¬ 
tribution of, according to age at death, 
(Table 1.12), 11. 

Deciles, definition, 36; interdecile range, 38; 
standard errors of, 225. 

de Finetti, B., refs., calculation of mean difference, 
47. 

Degrees of freedom, in ;f*-di8tribution, 292 ; in 
contingency table, 299. 

de la VaU^ Poussin, C. J., refs., Coura d'analyse, 
footnote, 233. 

A.8.— VOL. I. 


Demoivre, A., discoverer of normal distribution, 
131. 

Denjoy, A., theorem on uniqueness of quasi* 
anedytio functions, (Exorcise 4.10), 116. 

Dice, throws with, Weldion’s data (Table 1.14), 19 ; 
(Table 1.16), 23; (Example 8.10), 199; 
(Table 12.6), 301. 

Digits, distribution of, from telephone directory, 
(Table 1.4), 6. 

Direct probability, see Probability. 

Diricblet integrals, in Inversion Theorem, 91-2. 

Discontinuous frequency-fimotions, 12. 

-variate, examples of distribution according 

to, 6-7. 

Dispersion, measures of, 38-48 ; see also Standard 
Deviation. 

Distribution curve, 36-7. 

-functions, 12-15 ; determined by character¬ 
istic function, 91-4 ; limiting properties of, 
99-104, 110-13; determination by mo¬ 
ments, 106-10; standard distributions, 
116-63, see also Standard Distributions ; 
relation with probability, 172-3. 

Doodson, A. T., relation of mean, median and 
mode, 35, 46 ; refs., 47. 

D6rge, K., refs., axiomatis^tion of von Mises’ 
theory of probability, 183. 

Dressel, P. L., refs., seminvariants, 84-6, 286. 

Edgeworth, F. Y., citing Weldon’s dice data, 
(Table 1.14), 19 ; form of Gram-Charlier 
series, 148-9 ; refs., law of error, 160. 

Edwards, J., refs., Integral Calculus, footnote, 68 
and footnote, 221. 

Eells, W. C., formulae for probable errors of 
correlation coefficients, 410, refs,, 436. 

Egyptian skulls, distribution of, (Table 1.22), 
25. 

Elderton, E. M., data on health of son and alcohol- 
ism of father, (Exorcise 14.12), 366. 

Elderton, Sir William P., Hardy’s method of calcu¬ 
lating factorial moments, 59 ; corrections 
for moments when the distribution is sym¬ 
metrical, 85 and (Exomiae 3.10), 87-8; 
fitting of Pearson distributions, 143 ; on 
Gram-Charlier series, 153 ; tables of 293 ; 
refs., Frequency Curves and Correlation, 85, 
160. 

Error, standard, see Standard Error. 

Estimates, of proportions of attributes, 199-200 ; 
in largo samples generally, 201-3 ; of a 
ranking, 421. 

Euler-Moolaurin sum formula, 69. 

Expectation, 84. See also Mean Values, 

Extreme values of sample, distribution of, 217-22, 

Eye-colour, relation with hair-colour, (Example 

12.3) , 299 ; in parent and child, (Example 

13.4) , 314. 


QG 



450. 


ISTDEX 


iViotorial momexibs^ 50^; daSnitioDi^ 50; in 
terms of ordinary moments* 57-8 ; calcu¬ 
lation of, 58-60; Sheppard’s corrections 
to, 77-8 ; generating function for, &0 t of 
hinomial, 118; of hypergeometaric, (Exer¬ 
cise 5.4), I35« 

families deficient in rootn space, distribution of, 
(Table 1.24), 27. 

Fathers, height of, distiibutidn of sons according 
to, .(Table 14.3). 327. 

Fay, E. A., data from Marriage-3 of the Deaf in 
America^ (Table 1.19), 24. 

Fecundity, distribution of brood-mares according 
to, (Table 1.20), 24. 

Feeding and teeth in infants, (Example 12.6), 304. 
Fieller, E. C., refs, distribution of a ratio. 250. 

^ Filon, L. N. G., refs, (under Pearson), standard 
errors of frequency constants, 229. 

Finite populations, sampling from, 283-4. 
Finney, D. J., sampling of variance ratio, (Exercise 
14.8), 365. 

Firms in the Food, Drink and Tobacco Trades, 

, distribution of, (Table 1.1^), 23. 

First hands at whist, distribution of, (Table 5.4), 
128 ; (Example 12.1), 299-300. 

Limit Theorem, 100-1; converse of, 101-3. 

' Fiall^, Ame, modified form of Oram-Charlier 
^ series, 163 ; fitting of T>^pe B to data 
(Example 6.4), 156 j refs., Frequency 

Curves, 160. 

Fisher, K. A., Sheppard’s corrections, 75-7; 
introduction of word “ cimxulant,” 86; 
random sampling numbers, 194, 197 ; dis- 
. tribution of mean deviation, 216 ; dis¬ 
tribution of extreme, 220 ; z-distribution, 
see s-distribution; A;-statistics, 256, 268 ; 
measures of departure from normality, 
(Exercise 11.16), 289; tables of yf, 293; 
normal approximation to 294-5 ; dis¬ 
tribution of when parameter estimated 
from data, 301; distribution of variances 
and covariance in normal samples, 340; 
transformation of correlation coefficient, 
345; distribution of multiple correlation 
coefficient (Exercises 15.0 and 15.7), 387 ; 

' refs., moments and oumulants in specifica¬ 
tion of distributions, 160 ; mathematics of 
statistics, 184 ; inverse probability, 184; 
distribution of mean deviation, 228 ; dis¬ 
tribution of extreme values, 228 ; distribu¬ 
tion of correlation coefficient, 250, 363; 
distribution of well-known statistics, 250; 
applications of Student’s distribution, 250 ; 

-statistics, 286 ; distribution o€ x\ 305 ; 
distribution of partial coefficients, 380; 
distribution of multiple correlation, 380, 
Food, Drink and Tobacco Trades, distribution of 
fii-ms, (Table 1.17), 23. 


Foo6utey 4|6. 

Franeli theorem on distribution of digits in 
niatkematicfyi tables, footnote, 193. 

Fy4ohet, M., proof of Second Limit Tlxeorenj, 112, 
n3» refs., 114. 

Frequency (olaas-freqnefpcy), definition of, 2. 

Frequoncy-distribuliions, generally, 1-1-28 ; genesis 
of, 18. 

Frequency-functions, 12-16 ; discontinuous, 12 ; 

, determined by cliaracteristic function, 91-4; 
normalisation of, 166-9. 

Frequency-polygons, 4; bivariate foriij^i, 20. 

Friedman, M., tests of significance in m rankings, 
420, refs., 436. 

Frisch, K., moments of binon^l, 68 and (Exercise 
6.3) 136; refs., moments and cumulants, 
85, 134 ; correlation cuaalysis, 386. 

Galbrun, H., convergence of Gram-Charlier series, 
162. 

Galton, Sir Francis, data from Natural Inheritance^ 
(Example 13.4), 314; correlation, 363. 

Galton’s ogive, see Distribution curve. 

Galvani, L,, refs, (under Gini), median for qualita¬ 
tive characteristics, 47. 

Vi* y# (skewness and kurtosis), 82. 

F-fuuction, in summing Poisson series, 122. 

Garwood, F,, data from, (Table 12.1), 297-8; 
refs., fiducial fimits for Poisson distribution, 
306. 

Geary, R. C., distribution of measures of departure 
from nonnality, (Exercise 11.10), 289; 
distribution of a ratio, (Exercise 10.9), 253 
and refs., 260. 

Geiger, H.> see Rutherford. 

Generating functions, for moments and cmmilants, 
90. See also Chanicteristic functions. 

Geometric mean, see Mean, geometric. 

German women, distribution of suicides, (Table 
1.0), 7. 

Gilby, W. H., data from, on intelligence and 
clothing in schoolchildren, (Table 13.1), 320. 

Gini, C., coefficient of concentration, 43; co¬ 
efficient of mean difference, 42 ; standard 
error, 210, 226 ; refs., mean difibreiice, 47 ; 
median for qualitative characteristics, 47. 

Olossina morsitans (tsetse fly), distribution of 
trypanosomes in, (Table 1.13)> 12. 

Goring, C., data on alcoholism and crime, (Table 
14.6), 366. 

Go^et, W. S., see Student.” 

Grades, 408 ; relation with ranks, 408-10. 

Graduation curve, see Distribution curve. 

Grain, distribution of plots according to yield of, 
(Table 1.18), 23. 

Gram-Charlier series, Typo A, 147-64; Edge* 
worth’s form, 148~50; fitting to bean 
data, (Example 0.2), 161; distribution of 



1N®E3^, . 461 


laqieamfrom, (IBJxerose i|}.7), 252; Typ& B, 
154-«e ; Type, C* 160, 

CUfeen^oodf M-» deta mi mduatrial aoeidentB^ 
(Table 6.2b 1^4; inoculation against 
eWera» (Example 209. ^ 

Grouping bf frequeno^*>4^tiibutions, oorreotions 
fbr, aee Sb^pardt^a corrections. # 

Gumbeb E» J.» cBatribUtion 6f mth valuee« 220^2 

and refy., 226 . 


Hamea, J., refs, (under Fearson)» use of range, 228. 

Hair-coIoUr, relation with eye-colour, (Example 
12.3), 299-^SOO. 

Haldanbi B% S., refs., cumulants and moments 
. binomial, 134 and (Exercise 5.1), 135; 
tdth small espectations, 305 ; normalisa¬ 
tion of frequency functions, 306 and (Exer¬ 
cise 12.1), 306. 

Half-invariants (seminvariants), see Cumulants. 

Hall, Sir A. D., data on yield of grain, (Table 1.18), 
23. 

Hall, P., refs, distribution of mean from rectangular 
population, 260 ; multiple correlation, 386. 

Hamburger, H., problem, of moments, 107 and 
refs., 114. 

Hardy, Sir 0. F., calculation of factorial moments, 
59. 

Harrnomc mean, see Mean, Harmonic. 

Hartley, H, ()., distribution of range, 224 and 
refs., 228. 

Health of son and alcoholism of parent, (Exercise 
14.12), 366. 

Height, distribution of men according to, (Table 
1.7), 8 ; frequency-polygon of, (Figure 1.3), 
8 ; moan, (Example 2.1), 30-1 ; median, 
(Example 2.4), 35; quartiles, (Example 
2.6), 36 ; distribution curve, (Figure 2.2), 
37 ; moon deviation cuid standard devia¬ 
tion, (Example 2.6), 39-40; mean differ¬ 
ence, (Example 2.8), 46-6 ; factorial and 
ordinary moments (Example 3.7), 59-60; 
fitted to normal curves (Table 5.6), 132 ; 
standard error of mean, (Example 9.1), 207. 

-, in fathers and sons, (Table 14.3), 327 ; 

correlation, (Example 14.6), 337. 

-, distribution of plants according to, (Table 

8.1), 187. 

Heilman, M., data on teeth and feeding in infants, 
(Example 12.6), 304. 

Helly, W., theorems on convergent sequences of 
functions, 100, 112. 

Helmert, W., distribution of moan deviation, 216 ; 
distribution of sums of squares, 250, 305. 

Henderson, J„ refs., expansion in tetrachorio 
functions, 160. 

Hermite, C., polynomials, 145, 160. See Toheby- 
cheif-Hennite polynomials. 

A.S.—VOL. I. 


Heron, refr. (under Peateon), coefficients of 
association, 322. 

Heterotypic frequency-distributions, 146. 

Highest audible pitch and age, bivariate distribu¬ 
tion according to, (Table 14.1), 326 ; corre¬ 
lations and regressions, (Example 14.1), 
331; correlation ratios (Example 14.11), 
351-2. 

Hilferty, M. M., limiting distribution of 294-6 
and refe. (under Wilson), 305. 

Hilton, J., refe., inquiry by sample, 202. 

Histogram, 4; bivariate, 20, 

Hojo, T.^ refs., distribution of median, quartiles 
and semi-interquartile range, 228. 

Homosoedaatic distributions, 335. 

Hooker, R, H., data on weather and crops, 
(Example 15.1), 373 and refs., 386, 

Hotelling, H., distribution of Spearman’s p, 401 
and refs., 436, 

Hsu, C. T., sampling cumulants of normal dis¬ 
tribution, 276 and refs., 285. 

Hypergeometric distribution, generally, 126-8; 
moments of, 127 ; example of, (Table 6.4), 
128; factorial moments of, (Exercise 5.4), 
136; luniting forms of, 132-3. 

-function, 127. 

Hypothetical population, 187. 

Illusory association, 317. 

Income, distribution of persons by, (Table 1.2), 3 ; 
histogram of, (Figure 1.2), 4 ; distribution 
curve of, (Figure 2.1), 37. 

Incomplete moments, 43 ; of binomial, refs., 134 
and (Exercises 6.2, 5.3), 135. 

Independence, definition, 21 ; in association tables, 
309 ; in bivariate frequency tables, 326. 

Index, distribution of, &e€ Ratio. 

Induction, in finding sampling distributions, 
246-8. 

Inequalities for moments, 56 ; refs. (Shohat), 86; 
Jiiapounoff’s, 56 and (Exercise 3.14), 88. 

Inoculation against cholera, eee Cholera, 

-against tuberculosis in cattle, (Exercise 12.7), 

307. 

Intelligence, distribution of schoolchildren accord¬ 
ing to, (Example 13.6), 320. 

Intordecile rang^, 38. 

Interquartile range, 38 ; standard error of semi- 
interquartile range, (Example 9.8), 214. 

Interval (class-interval), see under Class. 

Intra-class correlation, 368-62. 

Itiverso probability, 176 ; aee Bayes’ theorem. 

Inversion theorem, 91-8 ; examples of use of, 
94—8. 

Irregular Kollektiv of von Mises, 171-2. 

Irwin, J. O., distribution of moans, (Exercises 
10.3 and 10.4), 251 and refs., 251. 

GO* 



452 


XNBIX 


J-eliaped distributioiit 10. , ^ 

^^ackson, Dunham, on indeterminacy median, 
46 and 47. 

^e£freye. It., logic of probability, 105 s refe., 

Theory of ProbabUUyt 184. ' 

Jensen, A., refs., representative mei^d in 

statistics, 202. 

Johonnsen, W., bean data cited by Pretorius, 
(Table 1.15), 20. 

Johnson, W. E., logic of probability, 166; refe., 
Logic, 184. 

Jordan, 0., refs., SUUieHque mcUhStncUique, 160; 
Type B series (Exercises 6.4 and 6.6), 161-2. 

Jorgensen, N. K., tables of Tchebyoheff-Hermite 
polynomials, 147, 161 ; refs., Underaogelaer 
over Frequemflader og Korrelation, 160. 

k-statistics, definition 256; general properties, 
266-60 ; sampling omnulants of, 260-89 ; 
in multivariate case, 281-3. 

K, as criterion of type in Pearson distributions, 140. 

Kelley, T. L., tables of x** 293 ; tables of correla¬ 
tion coefficient, 376 and refs., 386; refs., 
Kelley Statistical Tables, 386. 

Kendall, M. Q., data from, (Table 1.4), 6; Shep¬ 
pard corrections, 75; multivariate cumu- 
lants, 80; randomness, 172; maximxim 
likelihood, 179 ; data from, (Table 8.3), 189 ; 
Kandom Sampling Numbers, 193, 197; 
refs., Sheppard corrections, 86; multi¬ 
variate sampling formulae, 85; random¬ 
ness, 184; maximum likelihood, 184; 
randomness and random sampling numbers, 
202; ib-statistios, 286; ra^ correlation 
and paired comparisons, 430. 

Kendall, S. F. H., refs., distribution of Spearman’s 
p, 436. 

Keynes, .J. M., (now Lord Keynes), on probability, 
166; refs.. Treatise on Probability, 184. 

Kiser, C. V., refs,, pitfalls in sampling, 202. 

Koga, y., data from, (Table 14.1), 326. 

KoUektiv of von 171-2. 

Kolmogoroff, A., probability as abstract ensembles, 
165 ; refs., Orundbegriffe der Wahrachein- 
lichkeitstheorie, 184. 

Kondo, T., refe., standard error of mean square 
contingency, 321, 322. 

KuUback, S., refs., distributions and characteristic 
fui^ctions, 251, 363; and (Exercises 10.2 
and 10.6) 261, 252, (Exercise 14.7), 364-6. 

Kurtosis, 82. 

Lagrange, J. L., distribution of mean from rect« 
angular population, 250. 

Laplace, P. S. (Marquis de), characteristic funo* 
tions, 113; continued fraction for the 
normal distribution, 129-30; succession 


rule, (Examine 7.7), 177; early work on 
Central Limit Theorem, 180. 

Large samples, appieadmations in theory of, 201--2. 
See Standard Error. 

laterality of hand and eye, (Exercise 13.6), 323. 

Latter, O. H., data on length of cuckoo’s eggs, 
(Exercise 14413), 366. 

Lawiey, D. N., sampling oumulants of ib-statistios, 
276 and refo. (under Hsu), 286. 

Least squares, in determination of regression lines, 
328-9, 368. 

lee, A., data from, on fecundity of mares, (Table 
1.20), 24; on Stature of fathers and sons, 
(Table 14.8), 327 ; refs., sampling of correla¬ 
tion coefficient, (imder Co-operative Study), 
363, 

Leibniz, G. W., logic of probabilities, 165. 

Leptokurtosis, 82. 

L^vy, P., refs., Ocdctd des ProbabilitSs, 22, 184; 
characteristic functions, 113, 114. 

liapounofr. A., inequality for moments, 66 and 
(Exercise 3.14), 88 ; proof of Central Limit 
Theorem, 180, 183 ; refs., limit theorems in 
probability, 184. 

Likelihood, 176 ; principle of maximum likelihood, 
178-80 ; in estimating proportion of attri¬ 
butes, 199-200; relation with Bayes’ 
theorem, 178-80 and (Exercise 8.6), 203. 

Limit theorems for distributions, see First Limit 
Theorem, Second Limit Theorem. 

Lindeberg, J. W., condition for validity of Central 
Limit Tlioorom, 181. 

Linear regression, 327-9, 368-76. 

Location, measures of, 29-38. See Mean, etc. 

Lottery sampling, 192. 

m rankings, 410-21. 

Macaulay’s essays, distribution of sentence length 
in, (Table 1.21), 26. 

Male births, distribution of registration districts 
according to, (Table 14.2), 326 ; constants 
of, (Example 14.2), 364. 

Malocclusion of teeth in infants, (Example 12.6), 
304. 

Markoff, A., refs., Second Limit Theorem, 113. 

Marriages, distribution of Australian, see Aus¬ 
tralian ; of deaf in America, (Table 1.19), 
24. 

Martin, E. S., refs., corrections to momenta, 85. 

Maximum likelihood, 178, See Likelihood. 

Mean, arithmetic, definition of, 29 ; properties of, 
32 ; relation with median and mode, 36, 
46 ; os first moment, 39 ; standard error 
of, 224 ; distribution of, in normal samples, 
(Example 10,5), 238-9; in rectangular 
samples (Examples 10.7 and 10.12), 240-2, 
244; in Poisson distribution, (Example 
10.9), 243 ; in binomial, (Example 10.8), 



INDEX 


453 


243$ in Tjrpe HI dktnl9utio]:i (Example 

lOai), 244, 

Uean devia/tion, about meax^, 3S; about median, 
38; standard error of, 215. 

-difference, 42 ; calculation of, 46 and (Exer¬ 
cise 2.10) 48; standard error, 210-17, 225. 

, geometric, 32; less than arithmetic mean, 
33-4 ; dtetribution of, 246-0; ffom rect¬ 
angular poprdation, (Example 10.13), 245 ; 
ffom Type in distribution (Exercise 10.2)^ 
251. 

-, harmonic, 32; less than arithmetic and 

geometric means, 33-4. 

—— square contingency, 319. 

-values, 84 ; in sampling problems, 264-6. 

Measures of location, dispersion and skewness, 
29-48, 81-2. 

Median, 34; relation with mean and mode, 36, 
46 ; standard error of (Example 9.7), 213, 
226. 

Mehler, G., refs., expansion in tetrfiMJhoric series, 
363. 

Mendelian law, test of, as sampling of attributes, 
197-8; in pea breeding, (Example 12.2), 
299. 

Mercer, W., data from, (Table 1.18), 23. 

Merzrath, E., refs., bivariate frequency-distribu- 
tions and correlation, 86. 

Mesokurtosis, 82 ; in normal distribution, 129. 

Milk, costs of production of, (Table 1.9), 9. 

Milk-yield, distribution of cows according to, 
(Table 1.25), 27 ; covariance and variances, 
(Exerrise 14.1), 364. 

Milno-ThonL«iori, L. M., Calculus of Finite Differ^ 
ences, footnote, 69. 

Miner, J. R., tables of correlation coefficients, 376 
and refs., 386. 

Mises, R. von, probability as limit in sequences, 
166, 171-2 ; refs., Wahracheinlichkeit, Sta¬ 
tist ik und Wahrheit, 184.*^ 

Mode, 36 ; relation with median and moan, 35, 
46 ; standard error in Pearson distributions, 
225. 

Moments, preliminary, 39 ; Sheppard’s corrections 
to, 41 ; definition, 49 ; about one point m 
tt^rms of those about another, 49 ; calcula¬ 
tion of, 60-4; generating functions for, 
64-6, 90 ; absolute moments, see Absolute ; 
fax!torial moments, see Factorial; in terms 
of factorial moments, 67-8 ; relationship 
with cumulants, 61-4; corrections for 
grouping, 68-78 ; multivariate, 79-80; 
corrections to multivariate, 80-1 ; as 
characteristics of a distribution, 83-4; 
problem of moments, 105-10 ; of binomial, 
117, 118; of hypergoometric, 127 ; of 
normal distribution, 129 ; standard errors 
of, 204-11, 226; distribution of, 246. See 


also Sheppard’s coim^tions, Second Limit 
Theorem, Cumulants, 

Montel, P., theorem on oonvorgent sequences of 
functions, 100. 

Moore, G., data from, (Table 1.20), 24. 

Monmt, G., refs., random c^urrenoes in space and 
time, 134; data firom (Table 14.1), 325. 

tnth values, distribution of, 217-22. 

Multiple correlation, see Correlation. 

Multivariate s distributions, 19-22 ; normal dis¬ 
tribution, 376 7; sampling distributions, 
260; correlation, see Correlation; mo¬ 
ments and cumulants, 79-81 ; character¬ 
istic functions, 104-6 ; i;-statistios, 281-3. 
« 

Nair, U. 8., distribution of mean difference, 216, 
225 and refs., 228. 

Neyinan, J., on theory of estimation, footnote, 
180; refs.,N 4 PStiination, 184; representa¬ 
tive method, 202; sampling from finite 
population, 284, 285. 

Nicholson, C., refs., distribution of a ratio, 251. 

Normal distribution, generally, 128-32; moments 
of, (Example 8.4L 63-4; cumulants of, 
(Example 3.10), 67; providing standard of 
kurtosis, 82 ; ^characteristic function of, 
(Example 4.1), 94; as Ihnit of binomial, 
(Example 4.6), 103 ; determined uniquely 
by its moments, (Example 4.7), 109-10; 
as limit of Poisson distribution, (Example 
4.8), 113 ; distribution function of, 129-30 ; 
as of Pearson’s types, 141; in Central 
Limit Theorem, 180-3; in sampling of 
attributes, 198-9 ; distribution of mean in 
samplos from, (Example 10.2), 234-6, 
(Example 10.3), 236-7, (Example 10.10), 
243 ; distribution of variance in samples 
from, (Example 10.6), 238-9 ; sampling of 
A;-statistios from, 274; distribution of 
measures of departure from, (Exercise 
11.16), 288 ; bivariate form, ^ee Bivariate ; 
multivariate form, 376-7. 

Normalisation of frequency-functions, 156-9. 

Norris, N., refs., inequalities among averages, 47. 

Norton, .f. P., data from Statistical Studies in the 
New York Money Market^ (Table 1.26), 28. 

Ogbum, W. F., correlation of crime and religion, 
(Example 15.2), 375 and refs., 386. 

Ogive of Gallon, see Distribution curve. 

Oldis, E., refs,, significance of correlation co¬ 
efficient, (under E. S. Pearson), 363. 

Pabst, M. R., distribution of Spearman’s p, 401, 
and refs, (under Hotelling), 436, 

Paciello, U., refs., calculation of mean difference, 
47. 



454 


l^iaired eomparisoxifl* 42^1«-S6. ) , ^ 

Pairmaji, E*, reik*, ocxrveetionis to ftbrqpt distri- 
butioiuB^ 85. 

Parameters^ detoition, 3^^; of location^ *‘29~88 ; 
of ciispersiont 88-48. 

Partial: association, 818-*18 ; contingency, 821-2 ; 
correlation, see Correlation; regression, see 
Begression. 

Pattern functions, ip sampling ik^statistics, *202-5» 
277-8, 279, (Exea:oik0 11.11), 287. 

Pea breeding, (Example 12.2), 299. 

Pearce, T. V., data from, (Table 1.23), 26. 

Pearse, G. E.,''data from, (Table 1.11), 10; refs., 
oorreotions when ordinates are infinite, 86. 

Pearson, E. 8., distribution of range, 223, 224 ; 
sampling of correlation coefEcient, 846; 
distribution of Vb, 280^1, (Exercise 11.17), 
289 ; refs*, range, 228 ; estimating standard 
deviation, 228; distribution of frequency 
constants in skew population, 228; tests 
for normality, 285 ; correlation coefEcient, 
363; polychorio coefEcients, (under K. 
Pearson), 363. 

Pearson, Karl, data from: trypanosomes, (Table 
1.13), 12 ; fecundity of mares, (Table 1.20), 
24; whist deals, (Table 5.4), 128; height 
of fathers and sons, (Table 14.3), 327; 
quoting data by Goring on crime, (Table 
14.6), 356; quoting data by Elderton on 
alcoholism, (Exercise 14.12), 366. 

CoefEcient of variation, 43 ; measure of 
skewness, 81 ; coefEcient of contingency, 
319-20; sampling of contingency coefEcients, 
321; sampling of totrachoric r, 366, and of 
biserial rj, 368 ; grades and Spearman’s p, 
410. 

Kefs., corrections to abrupt distributions 
(under Pairman), 86 ; skew variation, 134 ; 
moments of hypergeometric, 134 ; 16-oon- 
stant frequency surface, 160; standard 
errors of frequency constants, 228-9 ; mean 
chareujter of ranked individual, 229 ; distri¬ 
bution of;f*, 261, 306 ; of difference of Type 
in variates, (Exercise 10.6), 262 ; sampling 
of contingency coefEcients, 322; multiple 
contingency, 322 ; sampling of correlation 
coefEcient, (under Co-operative Study), 363; 
probable error of biserial r/, 363 ; rank 
correlation, 436. 

Pearson, M. V., refs., moan character of ranked 
individuals, 229. 

Pearson distributions, aa limit of hypergeometric, 
132-3 ; generally, 137-46 ; recurrence re¬ 
lation for moments, 138 ; skewness of, 138 ; 
inEections of, 138; fitting of, 143-6; 
quadrature of, 146; generalisation by 
Bomanovsky, refs. 160, 161 ; distribution 
of means from (ref. Irwin), 260. 


Pitman, E. J. G., significaa^e test applicable 
to samples 6om any population, 436. 

Platykurtosis, 82. 

Poinoar4, oharaoterisiid functions, 113, 

-Poisson distribution, generally, 120-3 ; oumulante 
of, (Example 6.9), 66; mWentft of, (Exer¬ 
cise 3.3), 86; normal distribution as limit¬ 
ing form of, (Example 4.8), 113; distribu¬ 
tion ftinction of, 122 ; in mixed populations, 
122-4 ; bivariate form, (Exercise 6,8), 136 ; 
. sampling Of attributes from, (Exercise 8.2), 
203 ; distribution of moons from, (Example 
10.9), 243. 

Polynomials, aee Tohebycheff-Hermite polynomials. 

Populations, as basis of statistical theory, 1; 
existent, 18-19; hypothetical, 19; types 
in sampling, 186-7. 

Posterior probability, 176. 

Potatoes, bias in estimates of yield, (Example 8.4), 
189-90. 

-and wheat, correlation of yields, (Table 

14.4) , 333, (Example 14.3), 332-4. 

Pretorius, S. J., data from, on Australian marriages, 

(Table 1.8), 9; on beans, (Table 1.13), 20 
and (Table 6.1), 160; refs., skew bivariate 
distributions, 160. 

Principle of Maximum Likelihood, 178, See 
Likelihood. 

- of moments, 83 ; in fitting Pearson’s dis¬ 
tributions, 143. 

Prior probability, 176. 

Probability, generally, 164-85; logic of, 165; 
basic rules of direct probability, 166-70 ; 
in a contiixuum, 170-1 ; von Mises’ ap¬ 
proach, 171-2 ; and statistical distributions, 
172-3; Bayes* theorem, 175-8; inverse 
probability, 176 ; posterior and prior, 176. 

-frmotions, 14, 

Problem of moments, 105-10; refs. 113-14. 

Product-moment correlation, sec (’orrolation. 

Quadrature of Pearson distribxitions, 145. 

Quantiles, definition, 36; graphical detc'rmina- 
tion of, 37-8; standard errors of, 211-13. 

Quartiles, definition, 36 ; interquartile range as 
measure of dispersion, 38 ; standard errors 
of, 225. 

Radioactive element (polonium), distribution of 
particles from, (Table 6.2), 155, (Example 

6.4) , 156. 

Bamsey, F. P., logic of probability, 165; refs,. 
The Foundations of Mathematics, 184. 

Bandom variables, definition, 173 ; addition of, 
173. 

- Sampling Numbers, 192-7. 

Bandomising machine, (Example 8.3), 180. 



INDEX 


456 


EandoimiieaBv lit; random aaiiQ|]£iiig* ganextllyt 
166^209 s tbohnique of, 

Baxige^ dpdni^ioxif 98; dinbrilmtion of, ^29-4. 
Bmk 9 $b Comlatioii, ^ 

Baiilki]a(g» eateotioii of, 42|« 

Bankiiigs, probkm isi m, we m mnkiogB> 

Batio, distribution 248^9; Cram^r^a theorem 
(Bacerciae 10.8), 252; Geary’s theorem 
(Exercise 10.9), 263 ; refs., 260-1. 

Bectaugular population, transformation of fire- 
quency-distribution to, 18; as one of 
Pearson’s diatributions, 142; distribution 
of mean of samples from, (Example 10.7), 
240 and (Example 10.12), 244 ; distribu¬ 
tion of geometrio mean in samples from, 
(Example 10.13), 246-6. 

Kecurrenoe relations for moments of binomial, 118. 

Registrar-General’s Statistical Review of England 
and Wales, data from, (Table 1.1), 3; 
(Table 1.3), 6; (Table 1,11), 11. 

Registration districts, distribution according to 
births, (Table 14.2), 326. 

Regression, definition, 327-9 ; coefficients of, 329 ; 
criterion for linearity of, 336-6 ; sampling 
of coefficients of, 336-7, 347-9 ; standard 
error of coefficients, 337 ; significance of, 
358-9; partials, 368-79 ; sampling of 
partials, 378-9- 

Religion, correlation with crime, (Example 16.2), 
375-6. 

Reserves and bank deposits, distribution of, 
(Table 1.26), 28. 

Residuals, in regression equations, 369. 

Ritchie-Scott, A., refs., correlation coefficient of 
polychoric table, 363. 

Romanovslcy, V., refs., method of moments, 86 ; 
moments of hypergeomc'tric, 134 and (Exer¬ 
cise 5.2), 136; generalisation of Pearson 
distributions, 160. 

Room-space, distribution of families deficient in, 
(Table 1.24), 27. 

Rothamsted Experimental Station, data from, 
(Table 8.1), 187. 

Rutherford, Lord, data on omission of radioactive 
particles, (Example 6.4), 166. 

St. Georgescu, N., refs., sampling moments, 285. 

Saltus, in distribution function, 14. 

Sampling, preliminary, 174 ; simple, 174 ; random 
sampling, see Random ; sampling problem, 
186 ; with and without replacement, 186-7 ; 
randomness in, 187-97 ; lottery or ticket, 
192; from continuous population, 197; 
from attributes, 197-202. 

- distributions, 173-5 ; role in sampling 

problems, 201 ; exact, 231-53 ; derivation 
by analytical methods, 231-6, by geo¬ 
metrical methods, 236-42, by characteristic 


fbnetions, 242-6, by induction, 246-8; of 
a sum, 246-7; of a ratio, 248-9 ; multi¬ 
variate, 250; approximations to, 264-89. 

Sampling moments, generally, 264-89. See Cumu- 
lants, ib-statistics. 

Scale reading, biaa in, (Example 8.2), 188. 

Scarlet fever, deaths from, (Table 1.3), 5. 

Schoolchildren, distribution according to intelli- 
gence and clothing, (Example 13.0), 320. 

Second Limit Theorem, 110-13. 

Senn^interquartile range, as measure of skewness, 
38; standard error of, 216. 

Seminvariant statistics, 84-6, 266, refs. (Dressel 
and Kendall), 286. 

Seminvariants, 61, 84-6, refs., 84-6. See Cumu- 
lants. 

Sentences, distribution of according to length, 
(Table 1.21), 25. 

Sheppard, W. F., tables of normal distribution, 
130 and refs., 134correlation coefficient, 
(Exercise 14.4), 364. 

Sheppard’s corrections, 68-74; as average cor¬ 
rections, 74-6; for discrete data, 77, 
(Exercise 3.13), 88 ; to factorial moments, 
77-8 ; to cumuionts, 78; multivariate 
case, 80-1; compared with sampling 
fiuctuations, 210. 

Shirley poppies, distribution of, (Table 1.6), 7. 

Shohat, J., refSa., Stieltjes integrals, 22 ,* inequalities 
for moments, 86 ; Second Limit Theorem, 
112, 113, 114. 

Shuffling of cords, see Card-shufl9ing. 

Simple sampling, 174. 

Skew distributions, 10. 

Skewness, 10; measures of, 81-2; of Pearson 
distributions, 138 ; stimdard error of, 225. 

Skulls, Egyptian, distribution of, (Table 1.22), 26. 

Sons, distribution of according to stature, (Table 
14.3), 327. 

Soper, H. E., refs., Frequency Arrays, 134; 
sampling of correlation coefficient (under 
Co-operative Study), 363. 

Spahlinger vaccine, data on, (Exercise 12.7), 307. 

Spearman, C., cot'fficient of rank cornHotion, 
388-91 ; sampling of, 394--403 ; foot rule, 
436; refs., rank correlation, 436. 

Square contingency, 319. See 

Standard deviation, 39 ; standard error of, 224. 

- Distributions, 116-36, 137-63. See under 

Binomial, Hypergeometric, Poisson, Nor¬ 
mal, Pearson distributions, Oram-C^harlier 
series. Normalisation of frequency-func¬ 
tions. 

- errors, 199 ; in attributes, 199-201 ; gonor- 

ally, 204-30; compared with Sheppard 
corrections, 210; of sum and difforonce, 
226. (For standard errors of particular 
statistics, see under those statistics.) 



INDEX 


466 

Bliatidaitl tneaatir&,40 $ effect on eumulante of trans¬ 
formation to, 61; on ohara^‘teri8tio fimo- 
tion of transformation to, (Example 4.6), 
103. 

iStatistio, definition of, 2. 

Statistical kypothesis, 178. 

StaUatical Abatract^ data quoted from* (Table 1.2), 

8 . 

- Matnew of JEJngtand cmd WaJha^ data from, 

(Table 1.1), 8 ; (Table 1.3), 6 ; (Table 1.12), 

11 . 

- Stvdiea in the New Yorh Money Ma/rJoet 

(Norton), 28, 

Statistics, definition of, 1-2. 

Staturo, see Height. 

Steffensen, J., on Type B series, 168, 164; refs., 
Recent Beaearches in the Theory of Stat/iatica 
and Actuarial Science^ 161. 

Stereogram, 20-1. 

Stieltjes, J., problem of moments (Exercise 3.12), 
88, 106-7, 109; refs., 114. 

-- integrals, 16-16; refs. (Shohat), 22. 

Stigmatio rays, distribution of poppies according 
to, (Table 1.5), 7. 

Stouffer, K. A., distribution of difference of Type 
III variates, (Exercise 10.6), 252. 

** Student ” (W. S. Gosset), refs., Poisson dis¬ 
tribution, 134 ; probable error of mean, 
261 ; sampling of Spearman’s coefficients 
of rank correlation, 436. 

•• Student’s ” distribution, (Example 10.6), 239- 
40; (Example 10.17), 248; in testing 
correlation coefficient, 343 ; in testing 
Spearman’s p, 401 ; in testing regression 
coefficients, 349. 

Succession rule of Laplace, (Example 7.7), 177. 

Suicides, distribution of, (Table 1.6), 7. 

Sum of two variati^s, distribution of, 246-7. 

Sur-tax and super-tax, distribution of incomes 
liable to, (Table 1.2), 3 and liistogram 
(Fig. 1.2), 4. 

^-distribution, aee “ Student’s ” distribution. 

Tohebycheff, P. L., problem of moments, 114; 
inequality, (Exercise 8.4), 203. 

Tchebycheff-Hcrmite polynomials, 146-7 ; refs., 
160. 

Teeth and feeding in infants, (Example 12.6), 304. 

Telephone directory, distribution of digits from, 
(Table 1.4), 6. 193. 

Term of abortion, distribution of women according 
to, (Table 1.23), 26. 

Tetrachoric functions, 161, 366. 

- r, 364-6. 

Thiele, T. N., cumulants, 61 ; quotation about 
oracles, 178; sampling cumulants, 266 ; 
refs., Theory of Obaervationa, 86, 286. 

Thompson, C., tables of 


Tidkidt sampliffg* 192, 

Tippett, L. H* C., Xiandom Sampling IsrumbaifB, 
193, 197 ; distribution “bf extreme values, 
220 and reifo., 229; distribution range, 
223-4 and re^., 229. 

' Tocher, J. data from, (Table 1.26), 27. 

Transformatmn of a variate, 16, 21-2. 

Trigonometrioal repj?esentation of correlations, 
372. 

Truncated distributions, 11. 

Trypanosomes, distribution of, (Table 1.13), 12. 

Tsohuprow, A. A., sampling moments, 266; co¬ 
efficient of contingency, 320 ; refis., sampling 
moments, 284, 286. 

Tsetse flies, distribution of trypanosomes in, 
(Table 1.13), 12. 

Type A, Type B series, aee Gram-CharUer series. 

Type I distribution, 139-40. 

-II distribution, 141-2 ; distribution of means 

from (Exercise 10.4), 261. 

-Ill distribution*, characteristic function and 

moments of, (Example 3.6), 66-6 ; cumu¬ 
lants of, (Example 3.11), 67; generally, 
142 ; as sampling distribution of sura of 
variances, 231-3; distribution of means 
from (Example 10.11), 244 ; of geometric 
means from, (Exercise 10.2), 261 ; dis¬ 
tribution of differoiices from, (Exercise 
10.6), 262. 

-IV distribution, 140-1. 

-V distribution, 141 ; moments and cumu¬ 
lants, (Exercise 3.12), 67-8. 

- VI distribution, 140. 

- VII distribution, 142 ; moments of, (Exer¬ 
cise 3.1), 86. 

Tyjxis VIII-XII distributions, 142-3. 

U-shaped distributions, 10-11. 

Unbiased estimates, 200. 

Unimodal distributions, 29. 

Uspensky, J. V., refs., Central Limit Theorem, 
183 ; IrUrodvotion to Mathematical Proba¬ 
bility, 184, 261. 

Variable, random, aee Random variable. 

Variance, 39 ; as half mean-square of differences, 
42 ; standard error of, 224 ; distribution 
of, in normal samples, (Example 10.6), 
238-9, (Example 10.14), 246; of second 
mean-moment, (Example 11.2), 266; third 
moment of, (Example 11.3), 266. 

Variate, definition of, 2; transformations of, 
16-18, 21-2. 

Variation, coefficient of, 43 ; standard error of, 
(Example 9.6), 209, 224. 

Venn, J. A., refs., Logic of Chance, 184. 

Vigor, H. D., data from, (Table 14.2), 326. 



INDEX 


WaUis^ Wt A,, reft., correlation ratio for ranked 
data, 487 and (Exeroie 16.3), 437. 
Weather, correlation with cripe, (Example 16.1), 
873 -^. 

Weierstrase, K., diagonal pi^cess, ICM); theorem 
on seriee of polynomials, (Exeroiee 4.7), 116, 
Weight, distribution of men ,according to, (Table 
1 . 10 )% 10 . ,, 

Weldon, W. F. B., dice-throwing data, (Table 1.14), 
19; (Table 1.16), 23; (Table S.l), 117; 
(Exaznple 8.10), 199. 

Wheat, correlation of yields with potatoes, (Table 

14.4) , 333 ; (Example 14.3), 332 -4. 
plants, distribution of ranks according to 

height, (Table 8.1), 187. 

Whist, distribution of first hands at, (Table 6.4), 
128; (Example 12.1), 297-8. 

Whitaker, L., refs., Poisson distribution, 134. 
Wicksell, S. D., example from, (Example 14.4), 336. 
Willcox, W, F., definitions of statistics, 1, refs., 22. 
Wilson, E. B., limiting distribution of 294-6 
and refs., 305. 

Wishart, J., introduction of word ‘‘ cumultmt,** 
86; refs. : Komanovsky’s generalisation 
of Pearson distributions, 161 ; derivation 
of pattern formulae (under Fisher), 286; 
sampling curnulant fonnulae, 285; dis¬ 
tribution of multiple correlation and cor¬ 
relation ratios, 386. 

Wold, H,, Sheppard’s corrections, 71, 80 ; refs., 86. 
Women» distribution of according to ttmn of 
abortion, (Table 1.23), 26. 

Woo, T. L., data from, on skulls (Table 1.22), 26; 
on association of hand and eye, (Exercise 

13.5) , 323 ; tables of correlation ratio, 354, 


467 

Yaecikawa, K., refs., standard error of mocie, 
226. 

Yates, P„ data from, height of plants, (Table 8.1), 
187; Random Sampling Numbers, 194, 
197 ; tables of 293 ; correction to 
^ for grouping, 303 and (Example 1.2.3) 304; 
refe., bias in sampling, 202 ; correction to 
X\ 306. 

Yield of grain, distribution of, (Table 1.18), 23; 
of wheat and potatoes, correlation of, 
(Table 14.4), 333, (Example 14.3), 332- 
334. 

Young, A. W., refs., sampling of correlation oo- 
eftlciont (under Co-operative Study), 363. 

"Yule, G. Udny, data from, poppies, (Table, 1.6), 7 ; 
sentence length, (Table 1.21), 26 ; industrial 
accidents, (Table 6.3), 124; prints on 
photographic paper, (Exercise 12.8), 307 ; 
inoculation against cholera, (Example 13.1), 
309; births and registration districts, 
(Table 14.2), 326; data compiled from 
Fay, (Table 1.19), 24. 

Negatively indexed binomials, (footnote), 
126; normal distribution (Exercise 6.6), 
136 ; bias in scale-reading, 188 ; tables of 
293 ; coefficients of association, 310-13 ; 
refs., reading a scale, 202; degrees of 
fret)dom in (jontingoncy tables, 305; theory 
of correlation, 363, 386, 386. 


2 -distribution, (Example 10.18), 249; in testing 
correlation ratio, 363 ^4 ; in testing multiple 
correlation coefficient, 381->2 ^ in testing 
concordance in rankings, 419. 




DELHI POLYTECHNIC 


LIBRARY 

* CLASS NO. ^ f ' 

BOOK NO. h '/'-V R 
ACCESSION NO. t ‘ 

XVI J7- IM-49 L'JHMl 




